> Once the code for all destinations lived in a single repo, they could be merged into a single service. With every destination living in one service, our developer productivity substantially improved. We no longer had to deploy 140+ services for a change to one of the shared libraries. One engineer can deploy the service in a matter of minutes.
If you must to deploy every service because of a library change, you don't have services, you have a distributed monolith. The entire idea of a "shared library" which must be kept updated across your entire service fleet is antithetical to how you need to treat services.
I think your point while valid, it is probably a lot more nuanced. From the post it's more akin to an Amazon shared build and deployment system than "every library update needs to redeploy every time scenario".
It's likely there's a single source of truth where you pull libraries or shared resources from, when team A wants to update the pointer to library-latest to 2.0 but the current reference of library-latest is still 1.0, everyone needs to migrate off of it otherwise things will break due to backwards compatibility or whatever.
Likewise, if there's a -need- to remove a version for a vulnerability or what have you, then everyone needs to redeploy, sure, but the centralized benefit of this likely outweighs the security cost and complexity of tracking the patching and deployment process for each and every service.
I would say those systems -are- and likely would be classified as micro services but from a cost and ease perspective operate within a shared services environment. I don't think it's fair to consider this style of design decision as a distributed monolith.
By that level of logic, having a singular business entity vs 140 individual business entities for each service would mean it's a distributed monolith.
> It's likely there's a single source of truth for where you pull libraries or shared resources from, when team A wants to update the pointer to library-latest to 2.0 but the current reference of library-latest is still 1.0, everyone needs to migrate off of it otherwise things will break due to backwards compatibility or whatever.
No, this misses one of the biggest benefits of services; you explicitly don't need everyone to upgrade library-latest to 2.0 at the same time. If you do find yourself in a situation where you can't upgrade a core library like e.g. SQLAlchemy or Spring, or the underlying Python/Java/Go/etc runtime, without requiring updates to every service, you are back in the realm of a distributed monolith.
This is explicitly called out in the blog post in the trade-offs section.
I was one of the engineers who helped make the decisions around this migration. There is no one size fits all. We believed in that thinking originally, but after observing how things played out, decided to make different trade-offs.
To me it sounds like so: "We realized that we were not running microservice architecture, but rather a distributed monolith, so it made sense to make it a regular monolith". It's a decision I would wholeheartedly agree with.
I don't think you read the post carefully enough: they were not running a distributed monolith, and every service was using different dependencies (versions of them).
This meant that it was costly to maintain and caused a lot of confusion, especially with internal dependencies (shared libraries): this is the trade-off they did not like and wanted to move away from.
They moved away from this in multiple steps, first one of those being making it a "distributed monolith" (as per your implied definition) by putting services in a monorepo and then making them use the same dependency versions (before finally making them a single service too).
I think the blog post is confusing in this regard. For example, it explicitly states:
> We no longer had to deploy 140+ services for a change to one of the shared libraries.
Taken in isolation, that is a strong indicator that they were indeed running a distributed monolith.
However, the blog post earlier on said that different microservices were using different versions of the library. If that was actually true, then they would never have to deploy all 140+ of their services in response to a single change in their shared library.
Shared telemetry library, you realize that you are missing an important metric to operationalize your services. You now need to deploy all 140 to get the benefit.
Your runtime version is out of date / end of life. You now need to update and deploy all 140 (or at least all the ones that use the same tech stack).
No matter how you slice it, there are always dependencies across all services because there are standards in the environment in which they operate, and there are always going to be situations where you have to redeploy everything or large swaths of things.
Microservices aren’t a panacea. They just let you delay the inevitable but there is gonna be a point where you’re forced to comply with a standard somewhere that changes in a way that services must be updated. A lot of teams use shared libraries for this functionality.
These are great examples. I'll add one more. Object names and metadata definitions. Figuring out what the official name for something is across systems, where to define the source of truth, and who maintains it.
Why do all services need to understand all these objects though? A service should as far as possible care about its own things and treat other services' objects as opaque.
... otherwise you'd have to do something silly like update every service every time that library changed.
As you mention, it said early on that they were using different versions for each service:
> Eventually, all of them were using different versions of these shared libraries.
I believe the need to deploy 140+ services came out of wanting to fix this by using the latest version of the deps everywhere, and to then stay on top of it so it does not deteriorate in the same way (and possibly when they had things like a security fix).
The blog post says that they had a microservice architecture, then introduced some common libraries which broke the assumptions of compatibility across versions, forcing mass updates if a common dependency was updated. This is when they realized that they were no longer running a microservice architecture, and fused everything into a proper monolith. I see no contradiction.
Which is sort of fine, in my book. Update to the latest version of dependencies opportunistically, when you introduce other changes and roll your nodes anyway. Because you have well-defined, robust interfaces between the microservices, such they don't break when a dependency far down the stack changes, right?
If a change requires cascading changes in almost every other service then yes, you're running a distributed monolith and have achieved zero separation of services. Doesn't matter if each "service" has a different stack if they are so tightly coupled that a change in one necessitates a change in all. This is literally the entire point of micro-services. To reduce the amount of communication and coordination needed among teams. When your team releases "micro-services" which break everything else, it's a failure and hint of a distributed monolith pretending to be micro-services.
As I said, they mention having a problem where each service depended on different versions of internal shared libraries. That indicates they did not need to update all at once:
> When pressed for time, engineers would only include the updated versions of these libraries on a single destination’s codebase.
> Over time, the versions of these shared libraries began to diverge across the different destination codebases.
> ...
> Eventually, all of them were using different versions of these shared libraries.
FWIW, I think it was a great write up. It's clear to me what the rationale was and had good justification. Based on the people responding to all of my comments, it is clear people didn't actually read it and are opining without appropriate context.
Totally agree. For what it's worth, based on the limited information in the article, I actually do think it was the right decision to pull all of the per-destination services back into one. The shared library problem can go both ways, after all: maybe the solution is to remove the library so your microservices are fully independent, or maybe they really should have never been independent in the first place and the solution is to put them back together.
I don't think either extreme of "every line of code in the company is deployed as one service" or "every function is an independent FaaS" really works in practice, it's all about finding the right balance, which is domain-specific every time.
Having seen similar patterns play out at other companies, I'm curious about the organizational dynamics involved. Was there a larger dev team at the time you adopted microservices? Was there thinking involved like "we have 10 teams, each of which will have strong, ongoing ownership of ~14 services"?
Because from my perspective that's where microservices can especially break down: attrition or layoffs resulting in service ownership needing to be consolidated between fewer teams, which now spend an unforeseen amount of their time on per-service maintenance overhead. (For example, updating your runtime across all services becomes a massive chore, one that is doable when each team owns a certain number of services, but a morale-killer as soon as some threshold is crossed.)
I disagree. Both can be true at the same time. A good design should not point to library-latest in a production setting, it should point to a stable known good version via direct reference, i.e library-1.0.0-stable.
However, the world we live in, people choose pointing to latest, to avoid manual work and trust other teams did the right diligence when updating to the latest version.
You can point to a stable version in the model I described and still be distributed and a micro service, while depending on a shared service or repository.
You can do that but you keep missing that you’re no longer a true microservice as originally defined and envisioned, which is that you can deploy the service independently under local control.
Can you imagine if Google could only release a new API if all their customers simultaneously updated to that new API? You need loose coupling between services.
OP is correct that you are indeed now in a weird hybrid monolith application where it’s deployed piecemeal but can’t really be deployed that way because of tightly coupled dependencies.
Be ready for a blog post in ten years how they broke apart the monolith into loosely coupled components because it was too difficult to ship things with a large team and actually have it land in production without getting reverted to an unrelated issue.
Internal and external have wildly different requirements. Google internally can't update a library unless the update is either backward-compatible for all current users or part of the same change that updates all those users, and that's enforced by the build/test harness. That was an explicit choice, and I think an excellent one, for that scenario: it's more important to be certain that you're done when you move forward, so that it's obvious when a feature no longer needs support, than it is to enable moving faster in "isolation" when you all work for the same company anyway.
But also, you're conflating code and services. There's a huge difference between libraries that are deployed as part of various binaries and those that are used as remote APIs. If you want to update a utility library that's used by importing code, then you don't need simultaneous deployment, but you would like to update everywhere to get it done with - that's only really possible with a monorepo. If you want to update a remote API without downtime, then you need a multi-phase rollout where you introduce a backward-compatibility mode... but that's true whether you store the code in one place or two.
The whole premise of microservices is loose coupling - external just makes it plainly obvious that it’s a non starter. If you’re not loosely coupling you can call it microservices but it’s not really.
Yes I understand it’s a shared library but if updating that shared library automatically updates everyone and isn’t backward compatible you’re doing it wrong - that library should be published as a v2 or dependents should pin to a specific version. But having a shared library that has backward incompatible changes that is automatically vendored into all downstream dependencies is insane. You literally wouldn’t be able to keep track of your BOM in version control as it obtains a time component based on when you built the service and the version that was published in the registry.
> if updating that shared library automatically updates everyone and isn’t backward compatible you’re doing it wrong that library should be published as a v2 or dependents should pin to a specific version
...but why? You're begging the question.
If you can automatically update everyone including running their tests and making any necessary changes to their code, then persisting two versions forever is a waste of time. If it's because you can't be certain from testing that it's actually a safe change, then fine, but note that that option is still available to you by copy/pasting to a v2/ or adding a feature flag. Going to a monorepo gives you strictly more options in how to deal with changes.
> You literally wouldn’t be able to keep track of your BOM in version control as it obtains a time component based on when you built the service
This is true regardless of deployment pattern. The artifact that you publish needs to have pointers back to all changes that went into it/what commit it was built at. Mono vs. multi-repo doesn't materially change that, although I would argue it's slightly easier with a monorepo since you can look at the single history of the repository, rather than having to go an extra hop to find out what version 1.0.837 of your dependency included.
> the version that was published in the registry
Maybe I'm misunderstanding what you're getting at, but monorepo dependencies typically don't have a registry - you just have the commit history. If a binary is built at commit X, then all commits before X across all dependencies are included. That's kind of the point.
> ...but why? You're begging the question.
If you can automatically update everyone including running their tests and making any necessary changes to their code, then persisting two versions forever is a waste of time.
I’m not begging the question. I’m simply stating what loose coupling looks like and the blog post is precisely the problem of tight coupling. If you have multiple teams working on a tightly coupled system you’re asking for trouble. This is why software projects inevitably decompose against team boundaries and you ship your org chart - communication and complexity is really hard to manage as the head count grows which is where loose coupling helps.
But this article isn’t about moving from federated codebases to a single monorepo as you propose. They used that as an intermediary step to then enable making it a single service. But the point is that making a single giant service is well studied and a problem. Had this constantly at Apple when I worked on CoreLocation where locationd was a single service that was responsible for so many things (GPS, time synchronization of Apple Watches, WiFi location, motion, etc) that there was an entire team managing the process of getting everything to work correctly within a single service and even still people constantly stepped on each other’s toes accidentally and caused builds that were not suitable. It was a mess and the team that should have identified it as a bottleneck in need of solving (ie splitting out separate loosely coupled services) instead just kept rearranging deck chairs.
> Maybe I'm misunderstanding what you're getting at, but monorepo dependencies typically don't have a registry - you just have the commit history
I’m not opposed to a monorepo which I think may be where your confusion is coming from. I’m suggesting slamming a bunch of microservices back together is a poorly thought out idea because you’ll still end up with a launch coordination bottleneck and rolling back 1 team’s work forces other teams to roll back as well. It’s great the person in charge got to write a ra ra blog post for their promo packet. Come talk to me in 3 years with actual on the ground engineers saying they are having no difficulty shipping a large tightly coupled monolithic service or that they haven’t had to build out a team to help architect a service where all the different teams can safely and correctly coexist. My point about the registry is that they took one problem - a shared library multiple services depend on through a registry depend on latest causing problems deploying - and nuked it from orbit using a monorepo (ok - this is fine and a good solution - I can be a fan of monorepos provided your infrastructure can make it work) and making a monolithic service (probably not a good idea that only sounds good when you’re looking for things to do).
> I’m not begging the question. I’m simply stating what loose coupling looks like and the blog post is precisely the problem of tight coupling.
But it is not! They were updating dependencies and deploying services separately, and this led to every one of 140 services using a different version of "shared-foo". This made it cumbersome, confusing and expensive to keep going (you want a new feature from shared-foo, you have to take all the other features unless you fork and cherrypick on top, which makes it a not shared-foo anymore).
The point is that true microservice approach will always lead to exactly this situation: a) you either do not extract shared functions and live with duplicate implementations, b) you enforce keeping your shared dependencies always on very-close-to-latest (which you can do with different strategies; monorepo is one that enables but does not require it) or c) you end up with a mess of versions being used by each individual service.
The most common middle ground is to insist on backwards compatibility in a shared-lib, but carrying that over 5+ years is... expensive. You can mix it with an "enforce update" approach ("no version older than 2 years can be used"), but all the problems are pretty evident and expected with any approach.
I'd always err on the side of having a capability to upgrade at once if needed, while keeping the ability to keep a single service on a pinned version. This is usually not too hard with any approach, though monorepo makes the first one appear easier (you edit one file, or multiple dep files in a single repo): but unless you can guarantee all services get replaced in a deployment at exactly the same moment — which you rarely can — or can accept short lived inconsistencies, deployment requires all services to be backwards compatible until they are all updated with either approach).
I'd also say that this is still not a move to a monolith, but to a Service-Oriented-Architecture that is not microservices (as microservices are also SOA): as usual, the middle ground is the sweet spot.
To reference my other comment. This thread is about the nuance of if a dependency on a shared software repository means you are a microservice or not. I'm saying it's immaterial to the definition.
A dependency on an external software repository does not make a microservice no longer a microservice. It's the deployment configuration around said dependency that matters.
What everyone else is saying is that the core value proposition of microservices is that they are independently deployable (which I believe is what you are aiming for as well), which means that there is no tight coupling between them.
If one introduces tight coupling by having a shared library that gets updated in backwards incompatible way and needs to be updated simultaneously in each microservice, you move away from a microservices architecture as your services are not independently deployable anymore.
So in the general case, it is immaterial, but in practice, it can be a mechanism which introduces tight coupling and negates the core value of the microservices architecture.
Here, it was done on purpose as a step to a more monolithic architecture (though it was still only a single service in a larger system, so I'd avoid the "monolith" term).
> Be ready for a blog post in ten years how they broke apart the monolith into loosely coupled components because it was too difficult to ship things with a large team and actually have it land in production without getting reverted to an unrelated issue.
Some of their "solutions" I kind of wonder how they plan on resolving this, like the black box "magic" queue service they subbed back in, or the fault tolerance problem.
That said, I do think if you have a monolith that just needs to scale (single service that has to send to many places), they are possibly taking the correct approach. You can design your code/architecture so that you can deploy "services" separately, in a fault tolerant manner, but out of a mono repo instead of many independent repos.
They don't have a monolith: they have a service that has a restricted domain of responsibility matched to the team that runs it.
There is nothing magic about their queue service, and it seems correctly tuned to the complexity that they've got to cover: yes, just like most queue implementations, it will get different types of messages (events). If anything, their previous implementation was too complex which caused lots of waste.
With hindsight, they should have evolved their original architecture into exactly what they pivoted to now: better fault tolerance in "processors" of different types.
I would hope that my general rule of "only solve exactly the problem you have in front of you" would have avoided the approach they took, but engineers love to abstract away things and introduce indirection layers and add accidental complexity that way. And ofc, "microservices great, me want microservices" too :)
Again, I am not saying this as a slight: I believe many of us have learned the limits of microservices by, well, living through them :) And now we tune our abstraction layers differently.
> Can you imagine if Google could only release a new API if all their customers simultaneously updated to that new API? You need loose coupling between services.
Internal Google services: *sweating profusely*
(Mostly in jest, it's obviously a different ballgame internal to the monorepo on borg)
You're both right, but talking past each other. You're right that shared dependencies create a problem, but it can be the problem without semantically redefining the services themselves as a shared monolith. Imagine someone came to you with a similar problem and you concluded "distributed monolith", which may lead them to believe that their services should be merged into a single monolith. What if they then told you that it's going to be tough because these were truly separate apps, but that used the same OS wide Python install, one ran on Django/Postgres, another on Flask/SQLite, and another was on Fastapi/Mongo, but they all relied on some of the same underlying libs that are frequently updated. The more accurate finger should point to bad dependency management and you'd tell them about virtualenv or docker.
The dependencies they're likely referring to aren't core libraries, they're shared interfaces. If you're using protobufs, for instance, and you share the interfaces in a repo. Updating Service A's interface(s) necessitates all services dependent on communicating with it to be updated as well (whether you utilize those changes or not). Generally for larger systems, but smaller/scrappier teams, a true dependency management tree for something like this is out of scope so they just redeploy everything in a domain.
> If you're using protobufs, for instance, and you share the interfaces in a repo. Updating Service A's interface(s) necessitates all services dependent on communicating with it to be updated as well (whether you utilize those changes or not).
This is not true! This is one of the core strengths of protobuf. Non-destructive protobuf changes, such as adding new API methods or new fields, do not require clients to update. On the server-side you do need to handle the case when clients don't send you the new data--plus deal with the annoying "was this int64 actually set to 0 or is it just using the default?" problem--but as a whole you can absolutely independently update a protobuf, implement it on the server, and existing clients can keep on calling and be totally fine.
Now, that doesn't mean you can go crazy, as doing things like deleting fields, changing field numbering or renaming APIs will break clients, but this is just the reality of building distributed systems.
What you are talking about is simply keeping the API (whether a library or a service) backwards-compatible. There are plenty strategies to achieve that, and it can be done with almost any interface layer (HTTP, protobuf, JSON, SQL, ...).
I was oversimplifying for the sake of example, but yes you are correct. Properly managed protobufs don't require an update on strict interface expansion; so shouldn't always require a redeploy.
No, that sounds like you're breaking backwards compatibility too often.
Assuming there is an update every week, you can expect all teams to update within two weeks, which means if everything goes well only two versions are active.
That's not how things play out in practice. Say you do backwards incompatible changes "infrequently", say every 4 months, or 3 times per year. In 5 years, that's 15 versions with backwards incompatible changes.
Everybody is time pressured at least sometimes, and you miss one update, and now you've got multiple backwards-incompatible updates and you need to do it carefully the next time around, meaning more time needed, meaning it needs to be scheduled and planned along with other feature work now.
And then you end up with something close to a normal distribution of versions across ~140 services: some have the very old versions (v1-v4), majority are on some middle, not ancient versions but still ~7 versions behind on average (v5-v10), and only some are on the latest few versions (v11-v15). "Patch" versions can become even crazier, and yes, there will be bugs in them making them inadvertently not backwards compatible either (as not everybody updated right away to detect it).
But really, I always point out to good, long lived APIs that make compromises in their API for the sake of backwards compatibility (eg. we still live with "Referer" instead of "Referrer" in HTTP, 35 years later; and it is OK!).
> If you do find yourself in a situation where you can't upgrade a core library like e.g. SQLAlchemy or Spring, or the underlying Python/Java/Go/etc runtime, without requiring updates to every service, you are back in the realm of a distributed monolith.
Show me a language runtime or core library that will never have a CVE. Otherwise, by your definition, microservices don’t exist and all service oriented architectures are distributed monoliths.
Yes, you’re describing a distributed monolith. Microservices are independent, with nothing shared. They define a public interface and that’s it, that’s the entire exposed surface area. You will need to do major version bumps sometimes, when there are backwards incompatible changes to make, but these are rare.
The logical problem you’re running into is exactly why microservices are such a bad idea for most businesses. How many businesses can have entirely independent system components?
Almost all “microservice” systems in production are distributed monoliths. Real microservices are incredibly rare.
A mental model for true microservices is something akin to depending on the APIs of Netflix, Hulu, HBO Max and YouTube. They’ll have their own data models, their own versioning cycles and all that you consume is the public interface.
I'm trying to understand what you see as a really independent service with nothing shared.
For instance if company A used one of the GCP logging stack, and company B does the same. GCP updates it's profuct in a way that strongly encourages upgrading within a specific time frame (e.g. price will drastically increase otherwise), so A and B do it mostly at the same time for the same reason.
Are A and B truly independent under your vision ? or are they a company-spanning monolith ?
This Segment team was 3 people and 140 services. Microservices are best at solving org coordination issues where teams step on each other. This is a case of a team stepping on itself.
This type of elitist mentality is such a problem and such a drain for software development. "Real micro services are incredibly rare". I'll repeat myself from my other post, by this level of logic nothing is a micro service.
Do you depend on a cloud provider? Not a microservice. Do you depend on an ISP for Internet? Not a microservice. Depend on humans to do something? Not a microservice.
Textbook definitions and reality rarely coincide, rather than taking such a fundamentalist approach that leads nowhere, recognize that for all intents and purposes, what I described is a microservice, not a distributed monolith.
Yes, the user I'm replying to is suggesting that taking on a dependency of a shared software repository makes the service no longer a microservice.
That is fundamentally incorrect. As presented in my other post you can correctly use the shared repository as a dependency and refer to a stable version vs a dynamic version which is where the problem is presented.
The problem with having a shared library which multiple microservices depend on isn’t on the microservice side.
As long as the microservice owners are free to choose what dependencies to take and when to bump dependency versions, it’s fine - and microservice owners who take dependencies like that know that they are obliged to take security patch releases and need to plan for that. External library dependencies work like that and are absolutely fine for microservices to take.
The problem comes when you have a team in the company that owns a shared library, and where that team needs, in order to get their code into production, to prevail upon the various microservices that consume their code to bump versions and redeploy.
That is the path to a distributed monolith situation and one you want to avoid.
Yes we are in agreement. A dependency on an external software repository does not make a microservice no longer a microservice. It's the deployment configuration around said dependency that matters.
"by this level of logic nothing is a micro service"
Yes, exactly. The point is not elitism. Microservices are a valuable tool for a very specific problem but what most people refer to as "microservices" are not. Language is important when designing systems. Microservices are not just a bunch of separately deployable things.
The "micro" in "microservice" doesn't refer to how it is deployed, it refers to how the service is "micro" in responsibility. The service has a public interface defined in a contract that other components depend on, and that is it, what happens within the service is irrelevant to the rest of the system and vice verse, the service does not have depend on knowledge of the rest of the system. By virtue of being micro in responsibility, it can be deployed anywhere and anyhow.
If it is not a microservice, it is just a service, and when it is just a service, it is probably a part of a distributed monolith. And that is okay, a distributed monolith can be very valuable. The reason many people bristle at the mention of microservices is that they are often seen as an alternative to a monolith but they are not, it is a radically different architecture.
We must be precise in our language because if you or I build a system made up of "microservices" that aren't microservices, we're taking on all of the costs of microservices without any of the benefits. You can choose to drive to work, or take the bus, but you cannot choose to drive because it is the cheapest mode of transport or walk because it is the fastest. The costs and benefits are not independent.
The worst systems I have ever worked on were "microservices" with shared libraries. All of the costs of microservices (every call now involves a network) and none of the benefits (every service is dependent on the others). The architect of that system had read all about how great microservices are and understood it to mean separately deployable components.
There is no hierarchy of goodness, we are just in pursuit of the right tool or the job. A monolith, distributed monolith or a microservice architecture could be the right tool for one problem and the wrong tool for another.
> The "micro" in "microservice" doesn't refer to how it is deployed, it refers to how the service is "micro" in responsibility.
The "micro" in microservice was a marketing term to distinguish it from the bad taste of particular SOA technology implementations in the 2000s. A similar type of activity as crypto being a "year 3000 technology."
The irony is it was the common state that "services" weren't part of a distributed monolith. Services which were too big were still separately deployable. When services became nothing but an HTTP interface over a database entity, that's when things became complicated via orchestration; orchestration previously done by a service... not done to a service.
I am talking about using a shared software repository as a dependency. Which is valid for a microservice. Taking said dependency does not turn a microservice into a monoloth.
It may be a build time dependency that you do in isolation in a completely unrelated microservice for the pure purpose of building and compiling your business microservice. It is still a dependency. You cannot avoid dependencies in software or life. As Carl Sagan said, to bake an apple pie from scratch, you must first invent the universe.
>The worst systems I have ever worked on were "microservices" with shared libraries.
Ok? How is this relevant to my point? I am only referring to the manner in which your microservice is referencing said libraries. Not the pros or cons of implementing or using shared libraries (e.g mycompany-specific-utils), common libraries (e.g apache-commons), or any software component for that matter
>Yes, exactly
So you're agreeing that there is no such thing as a microservice. If that's the case, then the term is pointless other than a description of an aspirational yet unattainable state. Which is my point exactly. For the purposes of the exercise described the software is a microservice.
> Taking said dependency does not turn a microservice into a monoloth.
True. However one of the core tenets of microservices is that they should be independently deployable[1][2].
If taking on such a shared dependency does not interfere with them being independently deployable then all is good and you still have a set of microservices.
However if that shared dependency couples the services so that if one needs a new version of the shared dependency then all do, well you suddenly those services are no longer microservices but a distributed monolith.
And if my grandmother had wheels she would be a bike
There are categories and ontologies are real in the world. If you create one thing and call it something else that doesn’t mean the definition of “something else” should change
By your definition it is impossible to create a state based on coherent specifications because most states don’t align to the specification.
We know for a fact that’s wrong via functional programming, state machines, and formal verification
While you're right, I can only think of twice in my career where there was a "code red all services must update now", which were log4shell and spectre/meltdown (which were a bit different anyway). I just don't think this comes up enough in practice to be worth optimizing for.
You have not been in the field very long than I presume? There's multiple per year that require all hands on deck depending on your tech stack. Just look at the recent NPM supply chain attacks.
The npm supply chain attacks were only an issue if you don't use lock files. In fact they were a great example of why you shouldn't blindly upgrade to the latest packages when they are available.
Fair enough, which is why I called out my assumption:).
I'm referring to the all hands on deck nature of responding to security issues not the best practice. For many, the NPM issue was an all hands on deck.
Wait what? I've been wondering why people have been fussing over supply chain vulnerabilities, but I thought they mostly meant "we don't want to get unlucky and upgrade, merge the PR, test, and build the container before the malicious commit is pushed".
Who doesn't use lockfiles? Aren't they the default everywhere now? I really thought npm uses them by default.
We use pretty much the entire nodejs ecosystem, and only the very latest Next.js vulnerability was an all hands on deck vulnerability. That’s taken over the past 7 years.
To add to this conversation from our other thread, you solve a bunch of problems that are nearly just as bad by not using microservices yet you still do. And that is the same reason why people use JavaScript despite the issues it introduces. It’s not like you’re the only person the industry who hasn’t used a technology that irrationally introduces horrible consequences.
I mean I just participated in a Next JS incident that required it this week.
It has been rare over the years but I suspect it's getting less rare as supply chain attacks become more sophisticated (hiding their attack more carefully than at present and waiting longer to spring it).
A library which patches a security vulnerability should do so by bumping a patch version, maintaining backward compatibility. Taking a patch update to a library should mean no changes to your code, just rerun your tests and redeploy.
If libraries bump minor or major versions, they are imposing work on all the consuming services to accept the version, make compatibility changes, test and deploy.
This is pedantic, but no, it doesn't need to be updated everywhere. It should be updated as fast as possible, but there isn't a dependency chain there.
It’s easy to say things like this but also incredibly difficult to know if you’ll introduce subtle bugs or incompatibilities between services. It’s an example of people following the microservices pattern and then being given additional risk or problems deploying that are not immediately obvious when buying into this!
So let’s say you have a shared money library that you have fixed a bug in… what would you do in the real world - redeploy all your services that use said library or something else?
> It’s easy to say things like this but also incredibly difficult to know if you’ll introduce subtle bugs or incompatibilities between services.
You are right: it is difficult. It is harder than building a monolith. No argument there. I just don't think proper microservices are as difficult as people think. It's just more of a mindshift.
Plenty of projects and companies continue to release backwards compatible APIs: operating systems, Stripe/PayPal, cloud providers. Bugs come up, but in general people don't worry about ec2:DescribeInstances randomly breaking. These projects are still evolving internally while maintaining a stable external API. It's a skill, but something that can be learned.
> So let’s say you have a shared money library that you have fixed a bug in… what would you do in the real world - redeploy all your services that use said library or something else?
In the real world I would not have a shared "money library" to begin with. If there were money-related operations that needed to be used by multiple services, I would have a "money service" which exposed an API and could be deployed independently. A bug fix would then be a deploy to this service, and no other services would have to update or be aware of the fix.
This isn't a theoretical, either, as a "payments service" that encapsulates access to payment processors is something I've commonly seen.
> In the real world I would not have a shared "money library" to begin with. If there were money-related operations that needed to be used by multiple services, I would have a "money service" which exposed an API and could be deployed independently.
Depending on what functionality the money service handles, this could become a problem.
For example, one example of a shared library type function I've seen in the past is rounding (to make sure all of the rounding rules are handled properly based on configs etc.). An HTTP call for every single low level rounding operation would quickly become a bottleneck.
But really, a shared "money library" is exactly the same thing as a shared "money service" if everyone is using the same, latest version (which is easier to enforce with a networked "service").
The difference is in what's easy and what's hard. With a library, it's easy for everyone to run a different version, and hard for everyone to run the same version. With a service, it's easy for everyone to use the same version, and harder to use a different one (eg. creating multiple environments, and especially ephemeral "pull request" environments where you can mix and match for best automated integration and e2e testing).
But you can apply the same backwards-compatible API design patterns to a library that you would be applying to a service: no difference really. It's only about what's the time to detection when you break these patterns (with a library, someone finds out 2 years later when they update; with a service, they learn right away).
From the perspective of change management, what’s the difference between a shared library and an internal service relied on by multiple other services?
You still need to make sure changes don’t have unintended consequences downstream
I was coming here to say this. That the whole idea of a shared library couples all those services together. Sounds like someone wanted to be clever and then included their cleverness all over the platform. Dooming all services together.
Decoupling is the first part of microservices. Pass messages. Use json. I shouldn’t need your code to function. Just your API. Then you can be clever and scale out and deploy on saturdays if you want to and it doesn’t disturb the rest of us.
> Pass messages. Use json. I shouldn’t need your code to function. Just your API.
Yes, but there’s likely a lot of common code related to parsing those messages, interpreting them, calling out to other services etc. shared amongst all of them. That’s to be expected. The question is how that common code is structured if everything has to get updated at once if the common code changes.
Common code that’s part of your standard library, sure. Just parse the json. Do NOT introduce some shared class library that “abstracts” that away. Instead use versioning of schemas like another commenter said. Use protobuf. Use Avro. Use JSON. Use Swagger. Use something other than POCO/POJO shared library that you have to redeploy all your services because you added a Boolean to the newsletter object.
This right here. WTF do you do when you need to upgrade your underlying runtime such as Python, Ruby, whatever ¯\_(ツ)_/¯ you gotta go service by service.
If needs be. Or, you upgrade the mission critical ones and leave the rest for when you pick them up again. If your culture is “leave it better than when you found it” this is a non issue.
The best is when you use containers and build against the latest runtimes in your pipelines so as to catch these issues early and always have the most up to date patches. If a service hasn’t been updated or deployed in a long time, you can just run another build and it will pull latest of whatever.
The opposite situation of needing to upgrade your entire company's codebase all at once is much more painful. With services you can upgrade runtimes on an as-needed basis. In monoliths, runtime upgrades were massive projects that required a ton of coordination between teams and months or years of work.
That 3rd party library rarely gets updated whereas Jon’s commit adds a field and now everyone has to update or the marshaling doesn’t work.
Yes, there are scenarios where you have to deploy everything but when dealing with micro services, you should only be deploying the service you are changing. If updating a field in a domain affects everyone else, you have a distributed monolith and your architecture is questionable at best.
The whole point is I can deploy my services without relying on yours, or touching yours, because it sounds like you might not know what you’re doing. That’s the beautiful effect of a good micro service architecture.
I was trying to think of better terminology. Perhaps this works:
Two services can have a common dependency, which still leaves them uncoupled. An example would be a JSON schema validation and serialization/deserialization library. One service can in general bump its dependency version without the other caring, because it'll still send and consume valid JSON.
Two services can have a shared dependency, which couples them. If one service needs to bump its version the other must also bump its version, and in general deployment must ensure they are deployed together so only one version of the shared dependency is live, so to speak. An example could be a library containing business logic.
If you had two independent microservices and added a shared library as per my definition above, you've turned them into a distributed monolith.
Sometimes a common dependency might force a shared deployment, for example a security bug in the JSON library. However that is an exception, and unlike the business logic library. In the shared library case the exception is that one could be bumped without the other caring.
The third party shared library doesn't know your company exists. This means the third party dependency doesn't contain any business or application specific code and is applicable to any software project. This in turn means it has to solve the majority of business use cases ahead of time and be thoroughly tested to not break any consumers.
The problem has fundamentally gone away and reduced itself to a simple update problem, which itself is simpler because the update schedule is less frequent.
I use tomcat for all web applications. When tomcat updates I just need to bump the version number on one application and move on to the next. Tomcat does not involve itself in the data that is being transferred in a non-generic way so I can update whenever I want.
Since nothing blocks updates, the updates happen frequently which means no application is running on an ancient tomcat version.
Yeah this seems very much not a microservices setup.
I don't pretend proper microservices are a magic solution... but if you break the rules / system if microservices, that's not "microsercices" being bad, that's just creating problems for yourself.
You can keep using an older version for a while. You shouldn't need to redeploy everything at once. If you can't keep using the older version, you did it wrong.
And ideally, your logging library should rarely need to update. If you need unique integrations per service, use a plug-in architecture and keep the plug-ins local to each service.
I wasn't taking into account velocity of fleet-wide rollout, as I agree, you can migrate over time however. however, I was focusing on the idea that anytime of fleet wide rollout for a specific change was somehow "bad."
While I think that's a bit harsh :-) the sentiment of "if you have these problems, perhaps you don't understand systems architecture" is kind of spot on. I have heard people scoff at a bunch of "dead legacy code" in the Windows APIs (as an example) without understanding the challenge of moving millions of machines, each at different places in the evolution timeline, through to the next step in the timeline.
To use an example from the article, there was this statement: "The split to separate repos allowed us to isolate the destination test suites easily. This isolation allowed the development team to move quickly when maintaining destinations."
This is architecture bleed through. The format produced by Twilio "should" be the canonical form, which is submitted the adapter which mangles it to the "destination" form. Great, that transformation is expressible semantically in a language that takes the canonical form and spits out the special form. Changes to the transformation expression should not "bleed through" to other destinations, and changes to the canonical form should be backwards compatible to prevent bleed through of changes in the source from impacting the destination. At all times, if something worked before, it should continue to work without touching it because the architecture boundaries are robust.
Being able to work with a team that understood this was common "in the old days" when people were working on an operating system. The operating system would evolve (new features, new devices, new capabilities) but because there was a moat between the OS and applications, people understood that they had to architect things so that the OS changes would not cause applications that currently worked to stop working.
I don't judge Twilio for not doing robust architecture, I was astonished when I went to work of Google how lazy everyone got when the entire system is under their control (like there are no third party apps running in the fleet). The was a persistent theme of some bright person "deciding" to completely change some interface and Wham! every other group at Google had to stop what they were doing and move their code to the new thing. There was a particularly poor 'mandate' on a new version of their RPC while I was there. As Twilio notes, that can make things untenable.
Agreed. It sounds like they never made it to the distributed architecture they would have benefited from. That said, if the team thrives on a monolithic one they made the right choice.
Then every microservice network in existence is a distributed monolith so long as they communicate with one another.
If you communicate with one another you are serializing and deserializing a shared type. That shared type will break at the communication channels if you do not simultaneously deploy the two services. The irony is to prevent this you have to deploy simultaneously and treat it as a distributed monolith.
This is the fundamental problem of micro services. Under a monorepo it is somewhat more mitigated because now you can have type checking and integration tests across multiple repos.
Make no mistake the world isn’t just library dependencies. There are communication dependencies that flow through communication channels. A microservice architecture by definition has all its services depend on each other through this communication channels. The logical outcome of this is virtually identical to a distributed monolith. In fact shared libraries don’t do much damage at all if the versions are off. It is only shared types in the communication channels that break.
There is no way around this unless you have a mechanism for simultaneous merging code and deploying code across different repos which breaks the definition of what it is to be a microservice. Microservices always and I mean always share dependencies with everything they communicate with. All the problems that come from shared libraries are intrinsic to microservices EVEN when you remove shared libraries.
I believe in the original amazon service architecture, that grew into AWS (see “Bezos API mandate” from 2002), backwards compatibility is expected for all service APIs. You treat internal services as if they were external.
That means consumers can keep using old API versions (and their types) with a very long deprecation window. This results in loose coupling. Most companies doing microservices do not operate like this, which leads to these lockstep issues.
Yeah. that's a bad thing right? Maintaining backward compatibility to the end of time in the name of safety.
I'm not saying monoliths are better then microservices.
I'm saying for THIS specific issue, you will not need to even think about API compatibility with monoliths. It's a concept you can throw out the window because type checkers and integration tests catch this FOR YOU automatically and the single deployment insures that the compatibility will never break.
If you choose monoliths you are CHOOSING for this convenience, if you choose microservices you are CHOOSING the possibility for things to break and AWS chose this and chose to introduce a backwards compatibility restriction to deal with this problem.
I use "choose" loosely here. More likely AWS ppl just didn't think about this problem at the time. It's not obvious... or they had other requirements that necessitated microservices... The point is, this problem in essence is a logical consequence of the choice.
> Yeah. that's a bad thing right? Maintaining backward compatibility to the end of time in the name of safety.
This this is what I don't get about some comments in this thread. Choosing internal backwards compatibility for services managed by a team of three engineers doesn't make a lot of sense to me. You (should) have the organizational agility to make big changes quickly, not a lot of consensus building should be required.
For the S3 APIs? Sure, maintaining backwards compatibility on those makes sense.
Backwards compatibility is for customers. If customers don’t want to change apis… you provide backwards compatibility as a service.
If you’re using backwards compatibility as safety and that prevents you from doing a desired upgrade to an api that’s an entirely different thing. That is backwards compatibility as a restriction and a weakness in the overall paradigm while the other is backwards compatibility as a feature. Completely orthogonal actions imo.
> or they had other requirements that necessitated microservices
Scale
Both in people, and in "how do we make this service handle the load". A monolith is easy if you have few developers and not a lot of load.
With more developers it gets hard as they start affecting eachother across this monolith.
With more load it gets difficult as the usage profile of a backend server becomes very varied and performance issues hard to even find. What looks like a performance loss in one area might just be another unrelated part of the monolith eating your resources,
Exactly, performance can make it necessary to move away from a monolith.
But everyone should know that microservices are more complex systems and harder to deal with and a bunch of safety and correctness issues that come with it as well.
The problem here is not many people know this. Some people think going to microservices makes your code better, which I’m clearly saying here you give up safety and correctness as a result)
> If you communicate with one another you are serializing and deserializing a shared type.
Yes, this is absolutely correct. The objects you send over the wire are part of an API which forms a contract the server implementing the API is expected to provide. If the API changes in a way which is backwards compatible, this will break things.
> That shared type will break at the communication channels if you do not simultaneously deploy the two services.
This is only true if you change the shared type in a way which is not backwards compatible. One of the major tenets of services is that you must not introduce backwards incompatible changes. If you want to make a fundamental change, the process isn't "change APIv1 to APIv2", it's "deploy APIv2 alongside APIv1, mark APIv1 as deprecated, migrate clients to APIv2, remove APIv1 when there's no usage."
This may seem arduous, but the reality is that most monoliths already deal with this limitation! Don't believe me? Think about a typical n-tier architecture with a backend that talks to a database; how do you do a naive, simple rename of a database column in e.g. MySQL in a zero-downtime manner? You can't. You need to have some strategy for dealing with the backwards incompatibility which exists when your code and your database do not match. The strategy might be a simple add new column->migrate code->remove old column, including some thought on how to deal with data added in the interim. It might be to use views. It might be some insane strategy of duplicating the full stack, using change data capture to catch changes and flipping a switch.[0] It doesn't really matter, the point is that even within a monolith, you have two separate services, a database and a backend server, and you cannot deploy them truly simultaneously, so you need to have some strategy for dealing with that; or more generally, you need to be conscious of breaking API changes, in exactly the same way you would with independent services.
> The logical outcome of this is virtually identical to a distributed monolith.
Having seen the logical outcome of this at AWS, Hootsuite, Splunk, among others: no this isn't true at all really. e.g. The RDS team operated services independently of the EC2 team, despite calling out to EC2 in the backend; in no way was it a distributed monolith.
[0] I have seen this done. It was as crazy as it sounds.
>This is only true if you change the shared type in a way which is not backwards compatible. One of the major tenets of services is that you must not introduce backwards incompatible changes. If you want to make a fundamental change, the process isn't "change APIv1 to APIv2", it's "deploy APIv2 alongside APIv1, mark APIv1 as deprecated, migrate clients to APIv2, remove APIv1 when there's no usage."
Agreed and this is a negative. Backwards compatibility is a restriction made to deal with something fundamentally broken.
Additionally eventually in any system of services you will have to make a breaking change. Backwards compatibility is a behavioral comping mechanism to deal with a fundamental issue of microservices.
>This may seem arduous, but the reality is that most monoliths already deal with this limitation! Don't believe me? Think about a typical n-tier architecture with a backend that talks to a database; how do you do a naive, simple rename of a database column in e.g. MySQL in a zero-downtime manner? You can't. You need to have some strategy for dealing with the backwards incompatibility.
I believe you and am already aware. It's a limitation that exists intrinsically so it exists because you have No choice. A database and a monolith needs to exist as separate services. The thing I'm addressing here is the microservices and monolith debate. If you choose microservices, you are CHOOSING for this additional problem to exist. If you choose monolith, then within that monolith you are CHOOSING for those problems to not exist.
I am saying regardless of the other issues with either architecture, this one is an invariant in the sense that for this specific thing, monolith is categorically better.
>Having seen the logical outcome of this at AWS, Hootsuite, Splunk, among others: no this isn't true at all really. e.g. The RDS team operated services independently of the EC2 team, despite calling out to EC2 in the backend; in no way was it a distributed monolith.
No you're categorically wrong. If they did this in ANY of the companies you worked at then they are Living with this issue. What I'm saying here isn't an opinion. It is a theorem based consequence that will occur IF all the axioms are satisfied: namely >2 services that communicate with each other and ARE not deployed simultaneously. This is logic.
The only way errors or issues never happened with any of the teams you worked with is if the services they were building NEVER needed to make a breaking change to the communication channel, or they never needed to communicate. Neither of these scenarios is practical.
> The only way errors or issues never happened with any of the teams you worked with is if the services they were building NEVER needed to make a breaking change to the communication channel, or they never needed to communicate. Neither of these scenarios is practical.
IMO the fundamental point of disagreement here is that you believe it is effectively impossible to evolve APIs without breaking changes.
I don't know what to tell you other than, I've seen it happen, at scale, in multiple organizations.
I can't say that EC2 will never made a breaking change that causes RDS, lambda, auto-scaling to break, but if they do, it'll be front page news.
>IMO the fundamental point of disagreement here is that you believe it is effectively impossible to evolve APIs without breaking changes.
No certainly possible. You can evolve linux, macos and windows forever without any breaking changes and keep all apis backward compatible for all time. Keep going forever and ever and ever. But you see there's a huge downside to this right? This downside becomes more and more magnified as time goes on. In the early stages it's fine. And it's not like this growing problem will stop everything in it's tracks. I've seen organizations hobble along forever with increasing tech debt that keeps increasing for decades.
The downside won't kill an organization. I'm just saying there is a way that is better.
>I don't know what to tell you other than, I've seen it happen, at scale, in multiple organizations.
I have as well. It doesn't mean it doesn't work and can't be done. For example typescript is better than javascript. But you can still build a huge organization around javascript. What I'm saying here is one is intrinsically better than the other but that doesn't mean you can't build something on technology or architectures that are inferior.
And also I want to say I'm not saying monoliths are better than microservices. I'm saying for this one aspect monoliths are definitively better. There is no tradeoff for this aspect of the debate.
>I can't say that EC2 will never made a breaking change that causes RDS, lambda, auto-scaling to break, but if they do, it'll be front page news.
Didn't a break happen recently? Barring that... There's behavioral ways to mitigate this right? like what you mentioned... backward compatible apis always. But it's better to set up your system such that the problem just doesn't exist Period... rather then to set up ways to deal with the problem.
> The only way errors or issues never happened with any of the teams you worked with is if the services they were building NEVER needed to make a breaking change to the communication channel, or they never needed to communicate.
This is correct.
> Neither of these scenarios is practical.
This is not. When you choose appropriate tools (protobuf being an example), it is extremely easy to make a non-breaking change to the communication channel, and it is also extremely easy to prevent breaking changes from being made ever.
Protobuf works best if you have a monorepo. If each of your services lives within it's own repo then upgrades to one repo can be merged onto the main branch that potentially breaks things in other repos. Protobuf cannot check for this.
Second the other safety check protobuf uses is backwards compatibility. But that's a arbitrary restriction right? It's better to not even have to worry about backwards compatability at all then it is to maintain it.
Categorically these problems don't even exist in the monolith world. I'm not taking a side in the monolith vs. microservices debate. All I'm saying is for this aspect monoliths are categorically better.
You usually can't simultaneously deploy two services. You can try, but in a non trivial environment there are multiple machines and you'll want a rolling upgrade, which causes an old client to talk to a new service or vice versa. Putting the code into a monorepo does nothing to fix this.
This is much less of a problem than it seems.
You can use a serialisation format that allows easy backward compatible additions. The new service that has a new feature adds a field for it. The old client, responsibly coded, gracefully ignores the field it doesn't understand.
You can version the API to allow for breaking changes, and serve old clients old responses, and new clients newer responses. This is a bit of work to start and sometimes overkill, given the first point
If you only need very rare breaking changes, you can deploy new-version-tolerant clients first, then when that's fully done, deploy the new-version service. It's a bit of faff, but if it's very rare and internal, it's often easier than implementing full versioning.
> You usually can't simultaneously deploy two services
Yeah it’s roundabout solution to create something to deploy two things simultaneously. Agreed.
> Putting the code into a monorepo does nothing to fix this.
It helps mitigate the issue somewhat. If it was a polyrepo you suffer from an identical problem with the type checker or the integration test. The checkers basically need all services to be at the same version to do a full and valid check so if you have different teams and different repos the checkers will never know if team A made a breaking change that will effect team B because the integration test and type checker can’t stretch to another repo. Even if it could stretch to another repo you would need to do a “simultaneous” merge… in a sense polyrepos suffer from the same issue as microservices on the CI verification layer.
So if you have micro services and you have a polyrepos you are suffering from a twofold problem. Your static checks and integration tests are never correct and always either failing and preventing you from merging or deliberately crippled so as to not validate things across repos. At the same time your deploys will also guarantee to be broken if a breaking api change is made. You literally give up safety in testing, safety in type checking and working deploys by going microservices and polyrepos.
Like you said it can be fixed with backward comparability but that’s a bad thing to restrict your code to be that way.
> This is much less of a problem than it seems.
It is not “much less of a problem then it seems” because big companies have developed methods to do simultaneous deploys. See Netflix. If they took the time to develop a solution it means it’s not a trivial issue.
Additionally are you aware of any api issues in communication between your local code in a single app? Do you have any problems with this such that you are aware of it and come up with ways to deal with it? No. In a monolith the problem is nonexistent and it doesn’t even register. You are not aware this problem exists until you move to micro-services. That’s the difference here.
> You can use a serialisation format that allows easy backward compatible additions.
Mentioned a dozen times in this thread. Backwards compatibility is a bad thing. It’s a restriction that freezes all technical debt into your code. Imagine python 3 stayed backward compatible with 2 or the current version of macOS was still compatible with binaries from the first Mac.
> You can version the API to allow for breaking changes, and serve old clients old responses, and new clients newer responses. This is a bit of work to start and sometimes overkill, given the first point
Can you honestly tell me this is a good thing? The fact that you have to pay attention to this in microservices while in a monolith you don’t even need to be aware there’s an issue tells you all you need to know. You’re just coming up with behavioral work around and coping mechanisms to make microservices work in this area. You’re right it does work. But it’s a worse solution for this problem then monoliths which doesn’t have these work arounds because these problems don’t exist in monoliths.
> If you only need very rare breaking changes, you can deploy new-version-tolerant clients first, then when that's fully done, deploy the new-version service. It's a bit of faff, but if it's very rare and internal, it's often easier than implementing full versioning.
It’s only very rare in microservices because it’s weaker. You deliberately make it rare because of this problem. Is it rare to change a type in a monolith? No. Happens on the regular. See the problem? You’re not realizing but everything you’re bringing up is behavioral actions to cope with an aspect that is fundamentally weaker in microservices.
Let me conclude to say that there are many reasons why microservices are picked over monoliths. But what we are talking about here is definitively worse. Once you go microservices you are giving up safety and correctness and replacing it with work arounds. There is no trade off for this problem it is a logical consequence of using microservices.
Of course things are easier if you can run all your code in one binary on one machine, without remote users or any need to scale.
As soon as you add users you need to start coping with backwards compatibility though, even if your backend is still a monolith.
The backend being a monolith is probably easier for a while yes, but I've also lost count of the number of companies I've been to who have been in the painful process of breaking apart their monolith, because it doesn't scale and is hard to work with
Microservices or SOA aren't trivial, but the problems you bring up as extremely hard are pretty easy to deal with once you know how; and it buys you independent deploys and scale, which are pretty useful things at a bigger place
> That shared type will break at the communication channels if you do not simultaneously deploy the two services.
No. Your shared type is too brittle to be used in microservices. Tools like the venerable protobuf has solved this problem decades ago. You have a foundational wire format that does not change. Then you have a schema layer that could change in backwards compatible ways. Every new addition is optional.
Here’s an analogy. Forget microservices. Suppose you have a monolithic app and a SQL database. The situation is just like when you change the schema of the SQL database: of course you have application code that correctly deals with both the previous schema and the new schema during the ALTER TABLE. And the foundational wire format that you use to talk to the SQL database does not change. It’s at a layer below the schema.
This is entirely a solved problem. If you think this is a fundamental problem of microservices, then you do not grok microservices. If you think having microservices means simultaneous deployments, you also do not grok microservices.
1. Protobuf requires a monorepo to work correctly. Shared types must be checked across all repos and services simulateneously. Without a monorepo or some crazy work around mechanism this won't work. Think about it. These type checkers need everything at the same version to correctly check everything.
2. Even with a monorepo, deployment is a problem. Unless you do simultaneous deploys if one team upgrades there service and another team doesn't the Shared type is incompatible simply because you used microservices and polyrepos to allow teams to move async instead of insync. It's a race condition in distributed systems and it's theoremtically true. Not solved at all because it can't be solved by logic and math.
Just kidding. It can be solved but you're going to have to change definitions of your axioms aka of what is currently a microservice, monolith, monorepo and polyrepo. If you allow simultaneous deploys or pushes to microservices and polyrepos these problems can be solved but then can you call those things microservices or polyrepos? They look more like monorepos or monoliths... hmmm maybe I'll call it "distributed monolith".... See we are hitting this problem already.
>Here’s an analogy. Suppose you have a monolithic app and a SQL database. The situation is just like when you change the schema of the SQL database: of course you have application code that correctly deals with the previous schema and the new schema during the ALTER TABLE. And the foundational wire format that you use to talk to the SQL database does not change. It’s at a layer below the schema.
You are just describing the problem I provided. We call "monoliths" monoliths but technically a monolith must interact with a secondary service called a database. We have no choice in the matter. The monolith and microservice of course does not refer to that problem which SUFFERS from all the same problems as microservices.
>This is entirely a solved problem. If you think this is a fundamental problem of microservices, then you do not grok microservices. If you think having microservices means simultaneous deployments, you also do not grok microservices.
No it's not. Not at all. It's a problem that's lived with. I have two modules in a monolith. ANY change that goes into the mainline branch or deploy is type checked and integration tested to provide maximum safety as integration tests and type checkers can check the two modules simultaneously.
Imagine those two modules as microservices. Because they can be deployed at any time asynchronously, because they can be merged to the mainline branch at any time asynchronously They cannot be type checked or integration tested. Why? If I upgrade A which requires an upgrade to B but B is not upgraded yet, How do I type check both A and B at the same time? Axiomatically impossible. Nothing is solved. Just behavioral coping mechanisms to deal with the issue. That's the key phrase: behavioral coping mechanisms as opposed to automated statically checked safety based off of mathematical proof. Most of the arguments from your side will be consisting of this: "behavioral coping mechanisms"
> Then you have a schema layer that could change in backwards compatible ways. Every new addition is optional.
Also known as the rest of the fucking owl. I am entirely in factual agreement with you, but the number of people who are even aware they maintain an API surface with backwards compatibility as a goal, let alone can actually do it well, are tiny in practice. Especially for internal services, where nobody will even notice violations until it’s urgent, and at such a time, your definitions won’t save you from blame. Maybe it should, though. The best way to stop a bad idea is to follow it rigorously and see where it leads.
I’m very much a skeptic of microservices, because of this added responsibility. Only when the cost of that extra maintenance is outweighed by overwhelming benefits elsewhere, would I consider it. For the same reason I wouldn’t want a toilet with a seatbelt.
Bingo. Couldn't agree more. The other posters in this comment chain seem to view things from a dogmatic approach vs a pragmatic approach. It's important to do both, but individuals should call out when they are discussing something that is practiced vs preached.
My thesis is logical and derived from axioms. You will have fundamental incompatibilities between apis between services if one service changes the api. That’s a given. It’s 1 + 1 =2.
Now I agree there are plenty of ways to successfully deal with these problems like api backwards compatibility, coordinated deploys… etc… etc… and it’s a given thousands of companies have done this successfully. This is the pragmatic part, but that’s not ultimately my argument.
My argument is none of the pragamatisms and methodologies to deal with those issues need to exist in a monolithic architecture because the problem itself doesn’t exist in a monolith.
Nowhere did I say microservices can’t be successfully deployed. I only stated that there are fundamental issues with microservices that by logic must occur definitionally. The issue is people are biased. They tie their identity to an architecture because they advocated it for too long. The funniest thing is that I didn’t even take a side. I never said microservices were better or worse. I was only talking about one fundamental problem with microservices. There are many reasons why microservices are better but I just didn’t happen to bring it up. A lot of people started getting defensive and hence the karma.
Agreed. What I’m describing here isn’t solely pragmatic it’s axiomatic as well. If you model this as a distributed system with graph all microservices by definition will always reach a state where the apis are broken.
Most microservice companies either live with the fact or they have round about ways to deal with it including simultaneous deploys across multiple services and simultaneous merging, CI and type checking across different repos.
Once all the code for the services lived in one repo there was nothing preventing them from deploying the thing 140 times. I’m not sure why they act like that wasn’t an option.
If there’s any shared library across all your services, even a third party library, if that library has a security patch you now need to update that shared library across your entire service fleet. Maybe you don’t have that; maybe each service is written in a completely different programming language, uses a different database, and reimplements monitoring in a totally different way. In that case you have completely different problems.
My experience doesn't align with yours. I worked at SendGrid for over a decade and they were on the (micro) service train. I was on call for all dev teams on a rotation for a couple of years and later just for my team.
I have seen like a dozen security updates like you describe.
This was at a fintech and we took every single little vuln with the utmost priority. Triaged by severity of course, but everything had a ticking clock.
We didn't just have multiple security teams, we had multiple security orgs. If you didn't stay in compliance with VULN SLAs, you'd get a talking to.
We also had to frequently roll secrets. If the secrets didn't support auto-rotation, that was also a deployment (with other steps).
We also had to deploy our apps if they were stale. It's dangerous not to deploy your app every month or two, because who knows if stale builds introduced some kind of brittleness? Perhaps a change to some net library you didn't deploy caused the app not to tolerate traffic spikes. And it's been six months and there are several such library changes.
In my previous company we did everything as a micro service. In the company before that it was serverless on AWS!
In both cases we had to come up with clever solutions to simply get by because communication between services is of a problem. It is difficult (not impossible) to keep all the contracts in sync and deployment has to be coordinated in a very specific way sometimes. The initial speed you get is soon lost further down the path due to added complexities. There was fear-driven development at play. Service ownership is a problem. Far too much meetings are spent on coordination.
In my latest company everything is part of the same monolith. Yes the code is huge but it is so much easier to work with. We use a lot more unit tests then integration tests. Types make sense. Refactoring is just so easy. All the troubleshooting tools including specialised AI agents built on top of our own platform are part of the code-base which is kind of interesting because I can see how this is turning into a self-improving system. It is fascinating!
We are not planning to break up the monolith unless we grow so much that is impossible to manage from a single git repository. As far as I can tell this may never happen as it is obvious that much larger projects are perfectly well maintained in the exact same way.
The only downside is that build takes longer but honestly we found ways around that as well in the past and now with further improvements in the toolchains delivered by the awesome open-source communities around the world, I expect to see at least 10x improvement in deployment time in 2026.
Overall, in my own assessment, the decision to go for a monolith allowed us to build and scale much faster than if we had used micro services.
My experience is the opposite. I worked at SendGrid for a decade and we scaled the engineering org from a dozen to over 500 operating at scale sending billions of messages daily. Micro services. Well, services. The word micro messes people up.
I have also worked at half a dozen other shops with various architectures.
In every monolith, people violate separation of concerns and things are tightly coupled. I have only ever seen good engineering velocity happen when teams are decoupled from one another. I have only seen this happen in a (micro) service architecture.
Overall, in my own assessment, the decision to stick with a monolith has slowed down velocity and placed limits on scale at every other company I have been at and require changes towards decoupled services to be able to ship with any kind of velocity.
The place I just left took 2 years, over 50 teams, and over 150 individual contributors to launch a product that required us to move over an interface for sending messages from ORM querysets to DTOs. We needed to unlock our ability to start rearchitecting the modules because before it was impossible to know the actual edges of the system and how it used the data. This was incredibly expensive and hard and would never have happened but for the ability to reach into other's domains and making assumptions making things hard.
Don't couple systems. Micro services are the only arch I have seen successfully do this.
You say that every monolith you’ve seen has devolved into bad engineering — coupling, crossing boundaries. What was missing that could have stopped this? A missing service boundary you’d say, but also a lack of engineering leadership or lack of others’ experience? No code review? A commercial non-founder CEO pushing for results at the expense of codebase serviceability? Using a low-ceremony language (no types, no interfaces)?
You can stop coupling by enforcing boundaries. Repository boundaries are extremely solid ones. Too solid for some people, making it unnecessarily hard to coordinate changes on either side of the boundary. Barely solid enough for others, where it’s clearly too dangerous to let their devs work without solid brick walls keeping their hackers apart.
Coupling, smudged boundaries, and incoherence are symptoms of something more fundamental than simply we didn’t use services like we should have. If everyone’s getting colds off each other it’s because of bad hygiene in the office. Sure, you could force them to remain in their cubicles or stay at home but you could also teach them to wash their hands!
Generalizations don't help almost any discussion. Even if that's 100% of what you ever saw, many people have seen mixed (or entirely in the other extreme).
> In every monolith, people violate separation of concerns and things are tightly coupled. I have only ever seen good engineering velocity happen when teams are decoupled from one another. I have only seen this happen in a (micro) service architecture.
I would write this off as indifferent or incompetent tech leadership. Even languages that people call obscure -- like Elixir that I mostly work with -- have excellent libraries and tools that can help you draw enforceable boundaries. Put that in CI or even in pre-commit hook. Job done.
Why was that never done?
Of course people will default to the easier route. It's on tech leadership to keep them to higher standards.
Funny you mention Elixir. At one company, we passed around Ecto querysets. It started when the company was smaller. Then someone needed a little bit of analytics. And a few years of organic growth later and the system was bogged down. Queries where joining all over the place, and separating out the analytics from everything else was, again, a major undertaking.
I would love to see a counter example in real life at an org with over a dozen teams. A well working monolith and a well working monorepo are like unicorns; I don't believe they exist and everyone is talking about and trying to sell their mutant goat as one.
I am not selling you anything so you're starting off from a wrong premise.
What I said is that you should consider your experience prone to a bubble environment and as such it's very anecdotal. So is mine (a mix), granted. Which only means that neither extreme dominates out there. Likely a normal bell curve distribution.
What I did say (along with others) was that a little bit of technical discipline -- accentuating on "little" here -- nullifies the stated benefits of microservice architecture.
And it seems to me that the microservice architecture was chosen to overcome organizational problems, not technical ones.
Reading it with hindsight, their problems have less to do with the technical trade off of micro or monolith services and much more to do with the quality and organizational structure of their engineering department. The decisions and reasons given shine a light on the quality. The repository and test layout shine a light on the structure.
Given the quality and the structure neither approach really matters much. The root problems are elsewhere.
My observation is that many teams lack strong "technical discipline"; someone that says "no, don't do that", makes the case, and takes a stand. It's easy to let the complexity genie out of the bottle if the team doesn't have someone like this with enough clout/authority to actually make the team pause.
I think the problem is that this microservices vs monolith decision is a really hard one to convince people of. I made a passionate case for ECS instead of lambda for a long time, but only after the rest of the team and leadership see the problems the popular strategy generates do we get something approaching uptake (and the balance has already shifted to kubernetes instead, which is at least better)
I 100% agree with you but also sad fact is that it’s easy to understand why people don’t want to take this role. You can make enemies easily, you need to deliver “bad news” and convince people to put more effort or prove that effort they did was not enough. Why bother when you probably won’t be the one that have to clean it up
>the quality and organizational structure of their engineering department
You're not kidding. I had to work with twilio on a project and it was awful. Any time there was an issue with the API, they'd never delve into why that issue had happened. They'd simply fix the data in their database and close the ticket. We'd have the same issue over and over and over again and they'd never make any effort to fix the cause of the problems.
But perhaps the most famous source is Tolkien: "The Dwarves tell no tale; but even as mithril was the foundation of their wealth, so also it was their destruction: they delved too greedily and too deep, and disturbed that from which they fled, Durin's Bane."
As a non-native speaker, I read a lot of fantasy and science fiction books in English. I use "delve" regularly (I wouldn't say "frequently" though). Not sure if it's Terry Pratchett's Discworld influence, but plenty of archaic sounding words there.
I did not even know it was considered uncommon and archaic, tbh.
It's amazing how much explanatory power it has, to the point that I can predict at least some traits about a company's codebase during an interview process, without directly asking them about it.
1. "Peter principle": "people in a hierarchy and organizations tend to rise to 'a level of respective incompetence' "
2. "Parkinson's law": "Work expands to fill the available time".
So people are filling all the available the time and working tirelessly to reach their personal and organizational levels of incompetency; working hard without stopping to think if what they are doing should be done at all. And nobody is stopping them, nobody asks why (with the real analysis of positives, negatives, risks).
Incompetent + driven is the worst combination there can be.
A few thoughts: this is not really a move to a monolith. Their system is still a SOA (service-oriented architecture), just like microservices (make services as small as they can be), but with larger scope.
Having 140 services managed by what sounds like one team reinforces another point that I believe should be well known by now: you use SOAs (incuding microservices) to scale teams, and not services.
Eg. if a single team builds a shared library for all the 140 microservices and needs to maintain them, it's going to become very expensive quickly: you'll be using v2.3.1 in one service and v1.0.8 in another, and you won't even know yourself what API is available. Operationally, yes, you'll have to watch over 140 individual "systems" too.
There are ways to mitigate this, but they have their own trade-offs (I've posted them in another comment).
As per Conway's law, software architecture always follows the organizational structure, and this seems to have happened here: a single team is moving away from unneeded complexity to more effectively manage their work and produce better outcomes for the business.
It is not a monolith, but properly-scoped service level (scoped to the team). This is, in my experience, the sweet spot. A single team can run and operate multiple independent services, but with growth in those services, they will look to unify, so you need to restructure the team if you don't want that to happen. This is why I don't accept "system architect" roles as those don't give you the tools to really drive the architecture how it can be driven, and I really got into "management" :)
I am _not_ a microservices guy (like... at all) but reading this the "monorepo"/"microservices" false dichotomy stands out to me.
I think way too much tooling assumes 1:1 pairings between services and repos (_especially_ CI work). In huge orgs Git/whatever VCS you're using would have problems with everything in one repo, but I do think that there's loads of value in having everything in one spot even if it's all deployed more or less independently.
But so many settings and workflows couple repos together so it's hard to even have a frontend and backend in the same place if both teams manage those differently. So you end up having to mess around with N repos and can't send the one cross-cutting pull request very easily.
I would very much like to see improvements on this front, where one repo could still be split up on the forge side (or the CI side) in interesting ways, so review friction and local dev work friction can go down.
(shorter: github and friends should let me point to a folder and say that this is a different thing, without me having to interact with git submodules. I think this is easier than it used to be _but_)
I worked on building this at $PREV_EMPLOYER. We used a single repo for many services, so that you could run tests on all affected binaries/downstream libraries when a library changed.
We used Bazel to maintain the dependency tree, and then triggered builds based on a custom Github Actions hook that would use `bazel query` to find the transitive closure of affected targets. Then, if anything in a directory was affected, we'd trigger the set of tests defined in a config file in that directory (defaulting to :...), each as its own workflow run that would block PR submission. That worked really well, with the only real limiting factor being the ultimate upper limit of a repo in Github, but of course took a fair amount (a few SWE-months) to build all the tooling.
We’re in the middle of this right now. Go makes this easier: there’s a go CLI command that you can use to list a package’s dependencies, which can be cross-referenced with recent git changes. (duplicating the dependency graph in another build tool is a non-starter for me) But there are corner cases that we’re currently working through.
This, and if you want build + deploy that’s faster than doing it manually from your dev machine, you pay $$$ for either something like Depot, or a beefy VM to host CI.
A bit more work on those dependency corner cases, along with an auto-sleeping VM, should let us achieve nirvana. But it’s not like we have a lot of spare time on our small team.
* In addition, you can make your life a lot easier by just making the whole repo a single Go module. Having done the alternate path - trying to keep go.mod and Bazel build files in sync - I would definitely recommend only one module per repo unless you have a very high pain tolerance or actually need to be able to import pieces of the repo with standard Go tooling.
> a beefy VM to host CI
Unless you really need to self-host, Github Actions or GCP Cloud Build can be set up to reference a shared Bazel cache server, which lets builds be quite snappy since it doesn't have to rebuild any leaves that haven't changed.
I've heard horror stories about Bazel, but a lot of them involve either not getting full buy in from the developer team or not investing in building out Bazel correctly. A few months of developer time upfront does seem like a steep ask.
You're pointing out exactly what bothered me with this post in the first place: "we moved from microservices to a monolith and our problems went away"...
... except the problems had not much to do with the service architecture but all to do with operational mistakes and insufficient tooling: bad CI, bad autoscaling, bad oncall.
Both approaches can fail. Especially in environments like Node.js or Python, there's a clear limit to how much code an event loop can handle before performance seriously degrades.
I managed a product where a team of 6–8 people handles 200+ microservices. I've also managed other teams at the same time on another product where 80+ people managed a monolith.
What i learned? Both approaches have pros and cons.
With microservices, it's much easier to push isolated changes with just one or two people. At the same time, global changes become significantly harder.
That's the trade-off, and your mental model needs to align with your business logic. If your software solves a tightly connected business problem, microservices probably aren't the right fit.
On the other hand, if you have a multitude of integrations with different lifecycles but a stable internal protocol, microservices can be a lifesaver.
If someone tries to tell you one approach is universally better, they're being dogmatic/religious rather than rational.
Ultimately, it's not about architecture, it's about how you build abstractions and approach testing and decoupling.
> If your software solves a tightly connected business problem, microservices probably aren't the right fit.
If your software solves a single business problem, it probably belongs in a single (still micro!) service under the theory underlying microservices, in which the "micro" is defined in business terms.
If you are building services at a lower level than that, they aren't microservices (they may be nanoservices.)
To me this rationalization has always felt like duct tape over the real problem, which is that the runtime is poorly suited to what people are trying to do.
These problems are effectively solved on beam, the jvm, rust, go, etc.
Can you explain a bit more about what you mean by a limit on how much code an event loop can handle? What's the limit, numerically, and which units does it use? Are you running out of CPU cache?
I assume he means, how much work you let the event loop do without yielding. It doesn't matter if there's 200K lines of code but no real traffic to keep the event loop busy.
Most people don't realize their applications are running like dogwater on Node because serverless is letting them smooth it over by paying 4x what they would be paying if they moved 10 or so lines of code and a few regexes to a web worker.
(and I say that as someone who caught themselves doing the same:
severless is really good at hiding this.)
Depends on your definition of "scale", but yes. I ran an app serving ~1k requests/second from a Django monolith around 2017, distributed across ~20 Heroku "dynos". Nowadays a couple bare-metal servers will handle this.
The only rationale given for the initial switch to microservices is this:
> Initially, when the destinations were divided into separate services, all of the code lived in one repo. A huge point of frustration was that a single broken test caused tests to fail across all destinations.
You kept breaking tests in main so you thought the solution was to revamp your entire codebase structure? Seems a bit backward.
We had a similar problem in our monolith. Team #1 works on a feature, their code breaks tests. Team #2 works on another feature, their code is OK, but they can't move forward because of the failing tests from team #1. Plus often it takes additional time to figure out if it the tests fail because of feature #1 or feature #2, and who must fix them.
We solved it by simply giving every team their own dev environment (before merging to main). So if tests break in feature #1, it doesn't break anything for feature #2 or team #2. It's all confined to their environment. It's just an additional VM + a config in CI/CD. The only downside of this is that if there are conflicts between features they won't be caught immediately (only after one of the teams finally merges to main). But in our case it wasn't a problem because different teams rarely worked on the same parts of the monorepo at the same time due to explicit code ownership.
Thanks. It was a stupid most idea for MOST shops. I think maybe it works for AWS, Google and Netflix but everywhere in my career, I saw 90% of the problem was due to microservices.
Diving system into composable parts is a very very difficult problem already and it is only foolish to introduce further network boundaries between them.
Next comeback I see is away from React and SPAs as view transitions become more common.
> Once the code for all destinations lived in a single repo, they could be merged into a single service. With every destination living in one service, our developer productivity substantially improved. We no longer had to deploy 140+ services for a change to one of the shared libraries. One engineer can deploy the service in a matter of minutes.
This is the problem with the undefined nature of the term `microservices`, In my experience if you can't develop in a way that allows you to deploy all services independently and without coordination between services, it may not be a good fit for your orgs needs.
In the parent SOA(v2), what they described is a well known anti-pattern: [0]
Application Silos to SOA Silos
* Doing SOA right is not just about technology. It also requires optimal cross-team communications.
Web Service Sprawl
* Create services only where and when they are needed. Target areas of greatest ROI, and avoid the service sprawl headache.
If you cannot, due to technical or political reasons, retain the ability to independently deploy a service, no matter if you choose to actually independently deploy, you will not gain most of the advantages that were the original selling point of microservices, which had to do more with organizational scaling than technical conserns.
There are other reasons to consider the pattern, especially due to the tooling available, but it is simply not a silver bullet.
And yes, I get that not everyone is going to accept Chris Richardson's definitions[1], but even in more modern versions of this, people always seem to run into the most problems because they try to shove it in a place where the pattern isn't appropriate, or isn't possible.
But kudos to Twilio for doing what every team should be, reassessing if their previous decisions were still valid and moving forward with new choices when they aren't.
I would caution that microservices should be architected with technical concerns first—-being able to deploy independently is a valid technical concern too.
Doing it for organizational scaling can lead to insular vision with turf defensive attitude, as teams are rewarded on the individual service’s performance and not the complete product’s performance. Also refactoring services now means organizational refactoring, so the friction to refactor is massively increased.
I agree that patterns should be used where most appropriate, instead of blindly.
What pains me is that a language like “Cloud-Native” has been usurped to mean microservices. Did Twilio just stop having a “Cloud-Native” product due to shipping a monolith? According to CNCF, yes. According to reason, no.
In a discussion I was in recently, a participant mentioned "culture eats strategy for breakfast" .. which perhaps makes sense in this context. Be bold enough to do what makes the team and the product thrive.
> With everything running in a monolith, if a bug is introduced in one destination that causes the service to crash, the service will crash for all destinations
We can have a service with 100 features, but only enable the features relevant to a given "purpose". That way, we can still have "micro services" but they're running the same code: "bla.exe -foo" and "bla.exe -bar".
I have feeling that microservices improve overall design when they can live on their own, as microapps perhaps, also with their own UI. What is the point of service if it is not usable beyond its original design and just bound to other similar services?
> we had to spend time fixing the broken test even if the changes had nothing to do with the initial change. In response to this problem, it was decided to break out the code for each destination into their own repos.
One could also change the way tests are run or selected. Or allow manual overrides to still deploy. Separating repos doesn't sound like the only logical solution
In practice most monoliths turned into "microservices" are just monoliths in disguise. They still have most of the failure modes of the original monolith, but now with all the complexity and considerable challenges of distributed computing layered on top.
Microservices as a goal is mostly touted by people who don't know what the heck they're doing - the kind of people who tend to mistakenly believe blind adherence to one philosophy or the other will help them turn their shoddy work into something passable.
Engineer something that makes sense. If, once you're done, whatever you've built fits the description of "monolith" or "microservices", that's fine.
However if you're just following some cult hoping it works out for your particular use-case, it's time to reevaluate whether you've chosen the right profession.
Microservices were a fad during a period where complexity and solving self-inflicted problems were rewarded more than building an actual sustainable business. It was purely a career- & resume-polishing move for everyone involved.
Putting this anywhere near "engineering" is an insult to even the shoddiest, OceanGate-levels of engineering.
I remember when microservices were introduced and they were solving real problems around 1) independent technological decisions with languages, data stores, and scaling, and 2) separating team development processes. They came out of Amazon, eBay, Google and a host of successful tech titans that were definitely doing "engineering." The Bezos mandate for APIs in 2002 was the beginning of that era.
It was when the "microservices considered harmful" articles started popping up that microservices had become a fad. Most of the HN early-startup energy will continue to do monoliths because of team communication reasons. And I predict that if any of those startups are successful, they will have need for separate services for engineering reasons. If anything, the historical faddishness of HN shows that hackers pick the new and novel because that's who they are, for better or worse.
Wow. Their experience could not be more different than mine. As I’m contemplating the first year of my startup I’ve tallied 6000 deployments and 99.997 percent uptime and a low single digit rollback percentage (MTTR in low single digit minutes and fractional, single cell impact for them so far). While I’m sure it’s possible for a solo entrepreneur to hit numbers like that with a monolith I have never done so, and haven’t see others do so.
Edit: I’d love to eat the humble pie here. If you have examples of places where monoliths are updated 10-20 times a day by a small (or large) team post the link. I’ll read them all.
The idea of deploying to production 10-20 times per day sounds terrifying. What's the rationale for doing so?
I'll assume you're not writing enough bugs that customers are reporting 10-20 new ones per day, but that leaves me confused why you would want to expose customers to that much churn. If we assume an observable issue results in a rollback and you're only rolling back 1-2% of the time (very impressive), once a month or so customers should experience observable issues across multiple subsequent days. That would turn me off making a service integral to my workflow.
If something is difficult or scary, do it more often. Smaller changes are less risky. Code that is merged but not deployed is essentially “inventory” in the factory metaphor. You want to keep inventory low. If the distance between the main branch and production is kept low, then you can always feel pretty confident that the main branch is in a good state, or at least close to one. That’s invaluable when you inevitably need to ship an emergency fix. You can just commit the fix to main instead of trying to find a known good version and patching it. And when a deployment does break something, you’ll have a much smaller diff to search for the problem.
There's a lot of middle ground between "deploy to production 20x a day" and "deploy so infrequently that you forget how to deploy". Like, once a day? I have nothing against emergency fixes, unless you're doing them 9-19x a day. Hotfixes should be uncommon (neither rare nor standard practice).
Speed is the rationale. I have zero hesitation to deploy and am extremely well practiced at decomposing changes into a series of small safe changes at this point. So maybe it's a single spelling correction, or perhaps it's the backend for a new service integration -- it's all the same to me.
Churn is kind of a loaded word, I'd just call it change. Improvements, efficiencies, additions and yes, of course, fixes.
It may be a little unfair to compare monoliths with distributed services when it comes to deployments. I often deploy three services (sometimes more) to implement a new feature, and that wouldn't be the case with a monolith. So 100% there is a lower number of deploys needed in that world (I know, I've been there). Unfortunately, there is also a natural friction that prevents deploying things as they become available. Google called that latency out in DORA for a reason.
I can't believe how many times I have seen companies trying to implement microservices with multirepo and get lost in the access management, versioning and ending up producing a fragile, overly complex hot mess.
Monolith is definitely what you want to start with.
Being able to ~instantly obtain a perfect list of all references to all symbols is an extraordinarily powerful capability. The stronger the type system, the more leverage you get. If you have only ever had experience with weak type systems or poor tooling, I could understand how the notion of putting everything into one compilation context seems pointless.
I humbly post this little widget to help your team decide if some functionality warrants being a separate service or not: https://mulch.dev/service-scorecard/
I don't think this blog post reflects so well on this engineering team. Kudos to them to be so transparent about it though. "We had so many flaky tests that depended on 3rd parties that broke pipelines that we decided on micro-services" is not something I would put on my CV at least.
That seems unfair. There's a lot we don't know about the politics behind the scenes. I'd bet that the individuals who created the microservice architecture aren't the same people who re-consolidated them into one service. If true, the authors of the article are being generous to the original creators of the microservices, which I think reflects well on them for not badmouthing their predecessors.
Too much of anything sucks. Too big of a monolith? Sucks. Too many microservices? Sucks. Getting the right balance is HARD.
Plus, it's ALWAYS easier/better to run v2 of something when you completely re-write v1 from scratch. The article could have just as easily been "Why Segment moved from 100 microservices to 5" or "Why Segment rewrote every microservice". The benefits of hindsight and real-world data shouldn't be undersold.
At the end of the day, write something, get it out there. Make decisions, accept some of them will be wrong. Be willing to correct for those mistakes or at least accept they will be a pain for a while.
In short: No matter what you do the first time around... it's wrong.
This is a horror story of being totally unable to understand your product and its behavior and throwing people and resources in large rewrites to only learn that you still don't understand your product and its behavior. Badly done tests used as a justification to write multiple suites of badly done tests and it is all blamed on the architecture.
TL;DR they have a highly partitioned job database, where a job is a delivery of a specific event to a specific destination, and each partition is acted upon by at-most-one worker at a time, so lock contention is only at the infrastructure level.
In that context, each worker can handle a similar balanced workload between destinations, with a fraction of production traffic, so a monorepo makes all the sense in the world.
IMO it speaks to the way in which microservices can be a way to enforce good boundaries between teams... but the drawbacks are significant, and a cross-team review process for API changes and extensions can be equally effective and enable simplified architectures that sidestep many distributed-system problems at scale.
They also failed as a company, which is why that's on Twilio's blog now. So there's that. Undoubtedly their microservices architecture was a bad fit because of how technically focused the product was. But their solution with a monolith didn't have the desired effect either.
Failed? It was a $3.2B acquisition with a total of 283M raised. I don’t see any way that’s a failure.
That said I’m curious if you’re basing this on service degradation you’ve seen since the acquisition. We were thinking of starting to use them - is that a bad move?
By all means use Segment. Segment was a great technology with an incredible technical vision for what they wanted to do. I was in conversations in that office on Market far beyond what they ended up doing post-acquisition.
But a company that can't stand on its own isn't a success in my opinion. Similar things can be said about companies that continue to need round after round of funding without an IPO.
My comment is of the "(2018)" variety. Old news that didn't age well like the people jumping on the "Uber: why we switched to MySQL from Postgres" post. (How many people would choose that decision today?)
People tend to divorce the actual results of a lot of these companies from the gripes of the developers of the tech blogs.
This is not the first time that an engineer working at a big company thinks they are using a monolith when in reality they are a small team in charge of a single microservice, which in turn is part of a company that definitely does not run a monolith.
Last time it was an aws engineer that worked on route 53, and they dismissed microservices in a startup claiming that in AWS they ran a monolith (as in the r53 dns).
Everything is a monolith if you zoom in enough and ignore everything else. Which I guess you can do when you work on a big company and are in charge of a very specific role.
"Microservices is the software industry’s most successful confidence scam. It convinces small teams that they are “thinking big” while systematically destroying their ability to move at all. It flatters ambition by weaponizing insecurity: if you’re not running a constellation of services, are you even a real company? Never mind that this architecture was invented to cope with organizational dysfunction at planetary scale. Now it’s being prescribed to teams that still share a Slack channel and a lunch table.
Small teams run on shared context. That is their superpower. Everyone can reason end-to-end. Everyone can change anything. Microservices vaporize that advantage on contact. They replace shared understanding with distributed ignorance. No one owns the whole anymore. Everyone owns a shard. The system becomes something that merely happens to the team, rather than something the team actively understands. This isn’t sophistication. It’s abdication.
Then comes the operational farce. Each service demands its own pipeline, secrets, alerts, metrics, dashboards, permissions, backups, and rituals of appeasement. You don’t “deploy” anymore—you synchronize a fleet. One bug now requires a multi-service autopsy. A feature release becomes a coordination exercise across artificial borders you invented for no reason. You didn’t simplify your system. You shattered it and called the debris “architecture.”
Microservices also lock incompetence in amber. You are forced to define APIs before you understand your own business. Guesses become contracts. Bad ideas become permanent dependencies. Every early mistake metastasizes through the network. In a monolith, wrong thinking is corrected with a refactor. In microservices, wrong thinking becomes infrastructure. You don’t just regret it—you host it, version it, and monitor it.
The claim that monoliths don’t scale is one of the dumbest lies in modern engineering folklore. What doesn’t scale is chaos. What doesn’t scale is process cosplay. What doesn’t scale is pretending you’re Netflix while shipping a glorified CRUD app. Monoliths scale just fine when teams have discipline, tests, and restraint. But restraint isn’t fashionable, and boring doesn’t make conference talks.
Microservices for small teams is not a technical mistake—it is a philosophical failure. It announces, loudly, that the team does not trust itself to understand its own system. It replaces accountability with protocol and momentum with middleware. You don’t get “future proofing.” You get permanent drag. And by the time you finally earn the scale that might justify this circus, your speed, your clarity, and your product instincts will already be gone."
Is it 2018? Are you guys going to repost the MySQL DB as a queue story again? Perhaps an announcement that you’re migrating to Java 9 and what you learned about generics?
I left Twilio in 2018. I spent a decade at SendGrid. I spent a small time in Segment.
The shitty arch is not a point against (micro)services. SendGrid, another Twilio property, uses (micro)services to great effect. Services there were fully independently deployable.
The whole point of micro-services is to manage dependencies independently across service boundaries, using the API as the contract, not the internal libraries.
Then you can implement a service in Java, Python, Rust, C++, etc, and it doesn't matter.
Coupling your postgres db to your elasticsearch cluster via a hard library dependency impossibly heavy. The same insight applies to your bespoke services.
They have a monolith but struggle with individual subsystem failures bringing down the whole thing. Sounds like they would benefit from Elixir’s isolated, fail-fast architecture.
Great writeup. Much of this is more about testing, how package dependencies are expressed and many-repo/singlerepo tradeoffs than "microservices"!
Maintaining and testing a codebase containing many external integrations ("Destinations") was one of the drivers behind the earlier decision to shatter into many repos, to isolate the impact of Destination-specific test suite failures caused because some tests were actually testing integration to external 3rd party services.
One way to think about that situation is in terms of packages, their dependency structure, how those dependencies are expressed (e.g. decoupled via versioned artefact releases, directly coupled via monorepo style source checkout), their rates of change, and the quality of their automated tests suites (high quality meaning the test suite runs really fast, tests only the thing it is meant to test, has low rates of false negatives and false positives, low quality meaning the opposite).
Their initial situation was one that rapidly becomes unworkable: a shared library package undergoing a high rate of change depended on by many Destination packages, each with low quality test suites, where the dependencies were expressed in a directly-coupled way by virtue of everything existing in a single repo.
There's a general principle here: multiple packages in a single repo with directly-coupled dependencies, where those packages have test suites with wildly varying levels of quality, quickly becomes a nightmare to maintain. The packages with low quality test suites that depend upon high quality rapidly changing shared packages generate spurious test failures that need to be triaged and slow down development. Maintainers of packages that depend upon rapidly changing shared package but do not have high quality test suites able to detect regressions may find their package frequently gets broken without anyone realising in time.
Their initial move solves this problem by shattering the single repo and trade directly-coupled dependencies with decoupled versioned dependencies, to decouple the rate of change of the shared package from the per Destination packages. That was an incremental improvement but added the complexity and overhead of maintaining multiple versions of the "shared" library and per-repo boilerplate, which grows over time as more Destinations are added or more changes are made to the shared library while deferring the work to upgrade and retest Destinations to use it.
Their later move was to reverse this, go back to directly-coupled dependencies, but instead improve the quality of their per-Destination test suites, particularly by introducing record/replay style testing of Destinations. Great move. This means that the test suite of each Destination is measuring "is the Destination package adhering to its contract in how it should integrate with the 3rd party API & integrate with the shared package?" without being conflated with testing stuff that's outside of the control of code in the repo (is the 3rd party service even up, etc).
> Microservices is a service-oriented software architecture in which server-side applications are constructed by combining many single-purpose, low-footprint network services.
Gonna stop you right there.
Microservices have nothing to do with the hosting or operating architecture.
Martin Fowler who formalized the term, Microservices are:
“In short, the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery”
You can have an entirely local application built on the “microservice architectural style.”
Saying they are “often HTTP and API” is besides the point.
The problem Twilio actually describe is that they messed up service granularity and distributed systems engineering processes
Twilio's experience was not a failure of the microservice architectural style. This was a failure to correctly define service boundaries based on business capabilities.
Their struggles with serialization, network hops, and complex queueing were symptoms of building a distributed monolith, which they finally made explicit with this move. So they accidentally built a system with the overhead of distribution but the tight coupling of a single application. Now they are making their foundations of architecture fit what they built, likely cause they poorly planned it.
The true lesson is that correctly applying microservices requires insanely hard domain modeling and iteration and meticulous attention to the "Distributed Systems Premium."
Just because he says something does not mean Fowler “formalized the term”. Martin wrote about every topic under the sun, and he loved renaming and or redefining things to fit his world view, and incidentally drive people not just to his blog but also to his consultancy, Thoughtworks.
PS The “single application” line shows how dated Fowlers view were then and certainly are today.
I've been developing under that understanding since before Fowler-said-so. His take is simply a description of a phenomenon predating the moniker of microservices. SOA with things like CORBA, WSDL, UDDI, Java services in app servers etc. was a take on service oriented architectures that had many problems.
Anyone who has ever developed in a Java codebase with "Service" and then "ServiceImpl"s everywhere can see the lineage of that model. Services were supposed to be the API, and the implementation provided in a separate process container. Microservices signalled a time where SOA without Java as a pre-requisite had been successful in large tech companies. They had reached the point of needing even more granular breakout and a reduction of reliance on Java. HTTP interfaces was an enabler of that. 2010s era microservices people never understood the basics, and many don't even know what they're criticizing.
I feel like microservices have gotten a lot easier over the last 7 years from when Twilio experienced this, not just from my experience but from refinements in architectures
There are infinite permutations in architecture and we've collectively narrowed them down to things that are cheap to deploy, automatically scale for low costs, and easily replicable with a simple script
We should be talking about how AI knows those scripts too and can synthetize adjustments, dedicated Site Reliability Engineers and DevOps is great for maintaining convoluted legacy setups, but irrelevant for doing the same thing from scratch nowadays
You know what I think is better than a push of the CPU stack pointer and a jump to a library?
A network call. Because nothing could be better for your code than putting the INTERNET into the middle of your application.
--
The "micro" of microservices has always been ridiculous.
If it can run on one machine then do it. Otherwise you have to deal with networking. Only do networking when you have to. Not as a hobby, unless your program really is a hobby.
Microservices have nothing to do with the underlying hosting architecture. Microservices can all run and communicate on a single machine. There will be a local network involved, but it absolutely does require the internet or multiple machines.
Well implemented network hardware can have high bandwidth and low latency. But that doesn't get around the complexity and headaches it brings. Even with the best fiber optics, wires can be cut or tripped over. Controllers can fail. Drivers can be buggy. Networks can be misconfigured. And so on. Any request - even sent over a local network - can and will fail on you eventually. And you can't really make a microservice system keep working properly when links start failing.
Local function calls are infinitely more reliable. The main operational downside with a binary monolith is that a bug in one part of the program will crash the whole thing. Honestly, I still think Erlang got it right here with supervisor trees. Use "microservices". But let them all live on the same computer, in the same process. And add tooling to the runtime environment to allow individual "services" to fail or get replaced without taking down the rest of the system.
These "we moved from X to Y" posts are like Dunning-Kruger humblebrags. Yes, we all lack information and make mistakes. But there's never an explanation in these posts of how they've determined their new decision is any less erroneous than their old decision. It's like they threw darts at a wall and said "cool, that's our new system design (and SDLC)". If you have not built it yourself before, and have not studied in depth an identical system, just assume you are doing the wrong thing. Otherwise you are running towards another Dunning-Kruger pit.
If you have a company that writes software, please ask a professional software/systems architect to review your plans before you build. The initial decisions here would be a huge red flag to any experienced architect, and the subsequent decisions are full of hidden traps, and are setting them up for more failure. If you don't already have a very skilled architect on staff (99% chance you don't) you need to find one and consult with them. Otherwise your business will suffer from being trapped in unnecessary time-consuming expensive rework, or worse, the whole thing collapsing.
The “distributed monolith” line is the key takeaway here.
Microservices only buy you something if teams can deploy, version, and reason about them independently. Once shared libraries or coordinated deploys creep in, you’ve taken on all the operational cost with none of the autonomy benefits.
I’ve seen monoliths with clear module boundaries outperform microservice setups by an order of magnitude in developer throughput.
If microservice or monolith is giving order of magnitude improvement in productivity, you clearly are doing something wrong, or having terrible practices.
> Once the code for all destinations lived in a single repo, they could be merged into a single service. With every destination living in one service, our developer productivity substantially improved. We no longer had to deploy 140+ services for a change to one of the shared libraries. One engineer can deploy the service in a matter of minutes.
If you must to deploy every service because of a library change, you don't have services, you have a distributed monolith. The entire idea of a "shared library" which must be kept updated across your entire service fleet is antithetical to how you need to treat services.
I think your point while valid, it is probably a lot more nuanced. From the post it's more akin to an Amazon shared build and deployment system than "every library update needs to redeploy every time scenario".
It's likely there's a single source of truth where you pull libraries or shared resources from, when team A wants to update the pointer to library-latest to 2.0 but the current reference of library-latest is still 1.0, everyone needs to migrate off of it otherwise things will break due to backwards compatibility or whatever.
Likewise, if there's a -need- to remove a version for a vulnerability or what have you, then everyone needs to redeploy, sure, but the centralized benefit of this likely outweighs the security cost and complexity of tracking the patching and deployment process for each and every service.
I would say those systems -are- and likely would be classified as micro services but from a cost and ease perspective operate within a shared services environment. I don't think it's fair to consider this style of design decision as a distributed monolith.
By that level of logic, having a singular business entity vs 140 individual business entities for each service would mean it's a distributed monolith.
> It's likely there's a single source of truth for where you pull libraries or shared resources from, when team A wants to update the pointer to library-latest to 2.0 but the current reference of library-latest is still 1.0, everyone needs to migrate off of it otherwise things will break due to backwards compatibility or whatever.
No, this misses one of the biggest benefits of services; you explicitly don't need everyone to upgrade library-latest to 2.0 at the same time. If you do find yourself in a situation where you can't upgrade a core library like e.g. SQLAlchemy or Spring, or the underlying Python/Java/Go/etc runtime, without requiring updates to every service, you are back in the realm of a distributed monolith.
This is explicitly called out in the blog post in the trade-offs section.
I was one of the engineers who helped make the decisions around this migration. There is no one size fits all. We believed in that thinking originally, but after observing how things played out, decided to make different trade-offs.
To me it sounds like so: "We realized that we were not running microservice architecture, but rather a distributed monolith, so it made sense to make it a regular monolith". It's a decision I would wholeheartedly agree with.
I don't think you read the post carefully enough: they were not running a distributed monolith, and every service was using different dependencies (versions of them).
This meant that it was costly to maintain and caused a lot of confusion, especially with internal dependencies (shared libraries): this is the trade-off they did not like and wanted to move away from.
They moved away from this in multiple steps, first one of those being making it a "distributed monolith" (as per your implied definition) by putting services in a monorepo and then making them use the same dependency versions (before finally making them a single service too).
I think the blog post is confusing in this regard. For example, it explicitly states:
> We no longer had to deploy 140+ services for a change to one of the shared libraries.
Taken in isolation, that is a strong indicator that they were indeed running a distributed monolith.
However, the blog post earlier on said that different microservices were using different versions of the library. If that was actually true, then they would never have to deploy all 140+ of their services in response to a single change in their shared library.
Shared telemetry library, you realize that you are missing an important metric to operationalize your services. You now need to deploy all 140 to get the benefit.
Your runtime version is out of date / end of life. You now need to update and deploy all 140 (or at least all the ones that use the same tech stack).
No matter how you slice it, there are always dependencies across all services because there are standards in the environment in which they operate, and there are always going to be situations where you have to redeploy everything or large swaths of things.
Microservices aren’t a panacea. They just let you delay the inevitable but there is gonna be a point where you’re forced to comply with a standard somewhere that changes in a way that services must be updated. A lot of teams use shared libraries for this functionality.
These are great examples. I'll add one more. Object names and metadata definitions. Figuring out what the official name for something is across systems, where to define the source of truth, and who maintains it.
Why do all services need to understand all these objects though? A service should as far as possible care about its own things and treat other services' objects as opaque.
... otherwise you'd have to do something silly like update every service every time that library changed.
As you mention, it said early on that they were using different versions for each service:
I believe the need to deploy 140+ services came out of wanting to fix this by using the latest version of the deps everywhere, and to then stay on top of it so it does not deteriorate in the same way (and possibly when they had things like a security fix).Except if it's a security fix?
The blog post says that they had a microservice architecture, then introduced some common libraries which broke the assumptions of compatibility across versions, forcing mass updates if a common dependency was updated. This is when they realized that they were no longer running a microservice architecture, and fused everything into a proper monolith. I see no contradiction.
See my response to a sibling comment: they did not have "forced" updates and they really ended up with:
Which is sort of fine, in my book. Update to the latest version of dependencies opportunistically, when you introduce other changes and roll your nodes anyway. Because you have well-defined, robust interfaces between the microservices, such they don't break when a dependency far down the stack changes, right?
If a change requires cascading changes in almost every other service then yes, you're running a distributed monolith and have achieved zero separation of services. Doesn't matter if each "service" has a different stack if they are so tightly coupled that a change in one necessitates a change in all. This is literally the entire point of micro-services. To reduce the amount of communication and coordination needed among teams. When your team releases "micro-services" which break everything else, it's a failure and hint of a distributed monolith pretending to be micro-services.
As I said, they mention having a problem where each service depended on different versions of internal shared libraries. That indicates they did not need to update all at once:
FWIW, I think it was a great write up. It's clear to me what the rationale was and had good justification. Based on the people responding to all of my comments, it is clear people didn't actually read it and are opining without appropriate context.
> There is no one size fits all.
Totally agree. For what it's worth, based on the limited information in the article, I actually do think it was the right decision to pull all of the per-destination services back into one. The shared library problem can go both ways, after all: maybe the solution is to remove the library so your microservices are fully independent, or maybe they really should have never been independent in the first place and the solution is to put them back together.
I don't think either extreme of "every line of code in the company is deployed as one service" or "every function is an independent FaaS" really works in practice, it's all about finding the right balance, which is domain-specific every time.
Having seen similar patterns play out at other companies, I'm curious about the organizational dynamics involved. Was there a larger dev team at the time you adopted microservices? Was there thinking involved like "we have 10 teams, each of which will have strong, ongoing ownership of ~14 services"?
Because from my perspective that's where microservices can especially break down: attrition or layoffs resulting in service ownership needing to be consolidated between fewer teams, which now spend an unforeseen amount of their time on per-service maintenance overhead. (For example, updating your runtime across all services becomes a massive chore, one that is doable when each team owns a certain number of services, but a morale-killer as soon as some threshold is crossed.)
I disagree. Both can be true at the same time. A good design should not point to library-latest in a production setting, it should point to a stable known good version via direct reference, i.e library-1.0.0-stable.
However, the world we live in, people choose pointing to latest, to avoid manual work and trust other teams did the right diligence when updating to the latest version.
You can point to a stable version in the model I described and still be distributed and a micro service, while depending on a shared service or repository.
You can do that but you keep missing that you’re no longer a true microservice as originally defined and envisioned, which is that you can deploy the service independently under local control.
Can you imagine if Google could only release a new API if all their customers simultaneously updated to that new API? You need loose coupling between services.
OP is correct that you are indeed now in a weird hybrid monolith application where it’s deployed piecemeal but can’t really be deployed that way because of tightly coupled dependencies.
Be ready for a blog post in ten years how they broke apart the monolith into loosely coupled components because it was too difficult to ship things with a large team and actually have it land in production without getting reverted to an unrelated issue.
Internal and external have wildly different requirements. Google internally can't update a library unless the update is either backward-compatible for all current users or part of the same change that updates all those users, and that's enforced by the build/test harness. That was an explicit choice, and I think an excellent one, for that scenario: it's more important to be certain that you're done when you move forward, so that it's obvious when a feature no longer needs support, than it is to enable moving faster in "isolation" when you all work for the same company anyway.
But also, you're conflating code and services. There's a huge difference between libraries that are deployed as part of various binaries and those that are used as remote APIs. If you want to update a utility library that's used by importing code, then you don't need simultaneous deployment, but you would like to update everywhere to get it done with - that's only really possible with a monorepo. If you want to update a remote API without downtime, then you need a multi-phase rollout where you introduce a backward-compatibility mode... but that's true whether you store the code in one place or two.
The whole premise of microservices is loose coupling - external just makes it plainly obvious that it’s a non starter. If you’re not loosely coupling you can call it microservices but it’s not really.
Yes I understand it’s a shared library but if updating that shared library automatically updates everyone and isn’t backward compatible you’re doing it wrong - that library should be published as a v2 or dependents should pin to a specific version. But having a shared library that has backward incompatible changes that is automatically vendored into all downstream dependencies is insane. You literally wouldn’t be able to keep track of your BOM in version control as it obtains a time component based on when you built the service and the version that was published in the registry.
> if updating that shared library automatically updates everyone and isn’t backward compatible you’re doing it wrong that library should be published as a v2 or dependents should pin to a specific version
...but why? You're begging the question.
If you can automatically update everyone including running their tests and making any necessary changes to their code, then persisting two versions forever is a waste of time. If it's because you can't be certain from testing that it's actually a safe change, then fine, but note that that option is still available to you by copy/pasting to a v2/ or adding a feature flag. Going to a monorepo gives you strictly more options in how to deal with changes.
> You literally wouldn’t be able to keep track of your BOM in version control as it obtains a time component based on when you built the service
This is true regardless of deployment pattern. The artifact that you publish needs to have pointers back to all changes that went into it/what commit it was built at. Mono vs. multi-repo doesn't materially change that, although I would argue it's slightly easier with a monorepo since you can look at the single history of the repository, rather than having to go an extra hop to find out what version 1.0.837 of your dependency included.
> the version that was published in the registry
Maybe I'm misunderstanding what you're getting at, but monorepo dependencies typically don't have a registry - you just have the commit history. If a binary is built at commit X, then all commits before X across all dependencies are included. That's kind of the point.
> ...but why? You're begging the question. If you can automatically update everyone including running their tests and making any necessary changes to their code, then persisting two versions forever is a waste of time.
I’m not begging the question. I’m simply stating what loose coupling looks like and the blog post is precisely the problem of tight coupling. If you have multiple teams working on a tightly coupled system you’re asking for trouble. This is why software projects inevitably decompose against team boundaries and you ship your org chart - communication and complexity is really hard to manage as the head count grows which is where loose coupling helps.
But this article isn’t about moving from federated codebases to a single monorepo as you propose. They used that as an intermediary step to then enable making it a single service. But the point is that making a single giant service is well studied and a problem. Had this constantly at Apple when I worked on CoreLocation where locationd was a single service that was responsible for so many things (GPS, time synchronization of Apple Watches, WiFi location, motion, etc) that there was an entire team managing the process of getting everything to work correctly within a single service and even still people constantly stepped on each other’s toes accidentally and caused builds that were not suitable. It was a mess and the team that should have identified it as a bottleneck in need of solving (ie splitting out separate loosely coupled services) instead just kept rearranging deck chairs.
> Maybe I'm misunderstanding what you're getting at, but monorepo dependencies typically don't have a registry - you just have the commit history
I’m not opposed to a monorepo which I think may be where your confusion is coming from. I’m suggesting slamming a bunch of microservices back together is a poorly thought out idea because you’ll still end up with a launch coordination bottleneck and rolling back 1 team’s work forces other teams to roll back as well. It’s great the person in charge got to write a ra ra blog post for their promo packet. Come talk to me in 3 years with actual on the ground engineers saying they are having no difficulty shipping a large tightly coupled monolithic service or that they haven’t had to build out a team to help architect a service where all the different teams can safely and correctly coexist. My point about the registry is that they took one problem - a shared library multiple services depend on through a registry depend on latest causing problems deploying - and nuked it from orbit using a monorepo (ok - this is fine and a good solution - I can be a fan of monorepos provided your infrastructure can make it work) and making a monolithic service (probably not a good idea that only sounds good when you’re looking for things to do).
> I’m not begging the question. I’m simply stating what loose coupling looks like and the blog post is precisely the problem of tight coupling.
But it is not! They were updating dependencies and deploying services separately, and this led to every one of 140 services using a different version of "shared-foo". This made it cumbersome, confusing and expensive to keep going (you want a new feature from shared-foo, you have to take all the other features unless you fork and cherrypick on top, which makes it a not shared-foo anymore).
The point is that true microservice approach will always lead to exactly this situation: a) you either do not extract shared functions and live with duplicate implementations, b) you enforce keeping your shared dependencies always on very-close-to-latest (which you can do with different strategies; monorepo is one that enables but does not require it) or c) you end up with a mess of versions being used by each individual service.
The most common middle ground is to insist on backwards compatibility in a shared-lib, but carrying that over 5+ years is... expensive. You can mix it with an "enforce update" approach ("no version older than 2 years can be used"), but all the problems are pretty evident and expected with any approach.
I'd always err on the side of having a capability to upgrade at once if needed, while keeping the ability to keep a single service on a pinned version. This is usually not too hard with any approach, though monorepo makes the first one appear easier (you edit one file, or multiple dep files in a single repo): but unless you can guarantee all services get replaced in a deployment at exactly the same moment — which you rarely can — or can accept short lived inconsistencies, deployment requires all services to be backwards compatible until they are all updated with either approach).
I'd also say that this is still not a move to a monolith, but to a Service-Oriented-Architecture that is not microservices (as microservices are also SOA): as usual, the middle ground is the sweet spot.
To reference my other comment. This thread is about the nuance of if a dependency on a shared software repository means you are a microservice or not. I'm saying it's immaterial to the definition.
A dependency on an external software repository does not make a microservice no longer a microservice. It's the deployment configuration around said dependency that matters.
What everyone else is saying is that the core value proposition of microservices is that they are independently deployable (which I believe is what you are aiming for as well), which means that there is no tight coupling between them.
If one introduces tight coupling by having a shared library that gets updated in backwards incompatible way and needs to be updated simultaneously in each microservice, you move away from a microservices architecture as your services are not independently deployable anymore.
So in the general case, it is immaterial, but in practice, it can be a mechanism which introduces tight coupling and negates the core value of the microservices architecture.
Here, it was done on purpose as a step to a more monolithic architecture (though it was still only a single service in a larger system, so I'd avoid the "monolith" term).
> Be ready for a blog post in ten years how they broke apart the monolith into loosely coupled components because it was too difficult to ship things with a large team and actually have it land in production without getting reverted to an unrelated issue.
Some of their "solutions" I kind of wonder how they plan on resolving this, like the black box "magic" queue service they subbed back in, or the fault tolerance problem.
That said, I do think if you have a monolith that just needs to scale (single service that has to send to many places), they are possibly taking the correct approach. You can design your code/architecture so that you can deploy "services" separately, in a fault tolerant manner, but out of a mono repo instead of many independent repos.
They don't have a monolith: they have a service that has a restricted domain of responsibility matched to the team that runs it.
There is nothing magic about their queue service, and it seems correctly tuned to the complexity that they've got to cover: yes, just like most queue implementations, it will get different types of messages (events). If anything, their previous implementation was too complex which caused lots of waste.
With hindsight, they should have evolved their original architecture into exactly what they pivoted to now: better fault tolerance in "processors" of different types.
I would hope that my general rule of "only solve exactly the problem you have in front of you" would have avoided the approach they took, but engineers love to abstract away things and introduce indirection layers and add accidental complexity that way. And ofc, "microservices great, me want microservices" too :)
Again, I am not saying this as a slight: I believe many of us have learned the limits of microservices by, well, living through them :) And now we tune our abstraction layers differently.
Why issue isn’t with the monorepo but slamming all the microservices into a single monolithic service (the last part of the blog post).
> Can you imagine if Google could only release a new API if all their customers simultaneously updated to that new API? You need loose coupling between services.
Internal Google services: *sweating profusely*
(Mostly in jest, it's obviously a different ballgame internal to the monorepo on borg)
You're both right, but talking past each other. You're right that shared dependencies create a problem, but it can be the problem without semantically redefining the services themselves as a shared monolith. Imagine someone came to you with a similar problem and you concluded "distributed monolith", which may lead them to believe that their services should be merged into a single monolith. What if they then told you that it's going to be tough because these were truly separate apps, but that used the same OS wide Python install, one ran on Django/Postgres, another on Flask/SQLite, and another was on Fastapi/Mongo, but they all relied on some of the same underlying libs that are frequently updated. The more accurate finger should point to bad dependency management and you'd tell them about virtualenv or docker.
The dependencies they're likely referring to aren't core libraries, they're shared interfaces. If you're using protobufs, for instance, and you share the interfaces in a repo. Updating Service A's interface(s) necessitates all services dependent on communicating with it to be updated as well (whether you utilize those changes or not). Generally for larger systems, but smaller/scrappier teams, a true dependency management tree for something like this is out of scope so they just redeploy everything in a domain.
> If you're using protobufs, for instance, and you share the interfaces in a repo. Updating Service A's interface(s) necessitates all services dependent on communicating with it to be updated as well (whether you utilize those changes or not).
This is not true! This is one of the core strengths of protobuf. Non-destructive protobuf changes, such as adding new API methods or new fields, do not require clients to update. On the server-side you do need to handle the case when clients don't send you the new data--plus deal with the annoying "was this int64 actually set to 0 or is it just using the default?" problem--but as a whole you can absolutely independently update a protobuf, implement it on the server, and existing clients can keep on calling and be totally fine.
Now, that doesn't mean you can go crazy, as doing things like deleting fields, changing field numbering or renaming APIs will break clients, but this is just the reality of building distributed systems.
What you are talking about is simply keeping the API (whether a library or a service) backwards-compatible. There are plenty strategies to achieve that, and it can be done with almost any interface layer (HTTP, protobuf, JSON, SQL, ...).
I was oversimplifying for the sake of example, but yes you are correct. Properly managed protobufs don't require an update on strict interface expansion; so shouldn't always require a redeploy.
Oh god no.
I mean I suppose you can make breaking changes to any API in any language, but that’s entirely on you.
Right, but theres a cost to having to support 12 different versions of a library in your system.
Its a tradeoff
No, that sounds like you're breaking backwards compatibility too often.
Assuming there is an update every week, you can expect all teams to update within two weeks, which means if everything goes well only two versions are active.
That's not how things play out in practice. Say you do backwards incompatible changes "infrequently", say every 4 months, or 3 times per year. In 5 years, that's 15 versions with backwards incompatible changes.
Everybody is time pressured at least sometimes, and you miss one update, and now you've got multiple backwards-incompatible updates and you need to do it carefully the next time around, meaning more time needed, meaning it needs to be scheduled and planned along with other feature work now.
And then you end up with something close to a normal distribution of versions across ~140 services: some have the very old versions (v1-v4), majority are on some middle, not ancient versions but still ~7 versions behind on average (v5-v10), and only some are on the latest few versions (v11-v15). "Patch" versions can become even crazier, and yes, there will be bugs in them making them inadvertently not backwards compatible either (as not everybody updated right away to detect it).
But really, I always point out to good, long lived APIs that make compromises in their API for the sake of backwards compatibility (eg. we still live with "Referer" instead of "Referrer" in HTTP, 35 years later; and it is OK!).
> If you do find yourself in a situation where you can't upgrade a core library like e.g. SQLAlchemy or Spring, or the underlying Python/Java/Go/etc runtime, without requiring updates to every service, you are back in the realm of a distributed monolith.
Show me a language runtime or core library that will never have a CVE. Otherwise, by your definition, microservices don’t exist and all service oriented architectures are distributed monoliths.
Yes, you’re describing a distributed monolith. Microservices are independent, with nothing shared. They define a public interface and that’s it, that’s the entire exposed surface area. You will need to do major version bumps sometimes, when there are backwards incompatible changes to make, but these are rare.
The logical problem you’re running into is exactly why microservices are such a bad idea for most businesses. How many businesses can have entirely independent system components?
Almost all “microservice” systems in production are distributed monoliths. Real microservices are incredibly rare.
A mental model for true microservices is something akin to depending on the APIs of Netflix, Hulu, HBO Max and YouTube. They’ll have their own data models, their own versioning cycles and all that you consume is the public interface.
I'm trying to understand what you see as a really independent service with nothing shared.
For instance if company A used one of the GCP logging stack, and company B does the same. GCP updates it's profuct in a way that strongly encourages upgrading within a specific time frame (e.g. price will drastically increase otherwise), so A and B do it mostly at the same time for the same reason.
Are A and B truly independent under your vision ? or are they a company-spanning monolith ?
> mostly at the same time
Mostly? If you can update A one week and B the next week with no breakage in between, that seems pretty independent.
This was also the case for the micro-service situation described in the article. From the FA:
> Over time, the versions of these shared libraries began to diverge across the different destination codebases.
I don't see the problem?
There's at least one employee per micro service so there should be zero problems preventing just bumping the version of the library.
This Segment team was 3 people and 140 services. Microservices are best at solving org coordination issues where teams step on each other. This is a case of a team stepping on itself.
This type of elitist mentality is such a problem and such a drain for software development. "Real micro services are incredibly rare". I'll repeat myself from my other post, by this level of logic nothing is a micro service.
Do you depend on a cloud provider? Not a microservice. Do you depend on an ISP for Internet? Not a microservice. Depend on humans to do something? Not a microservice.
Textbook definitions and reality rarely coincide, rather than taking such a fundamentalist approach that leads nowhere, recognize that for all intents and purposes, what I described is a microservice, not a distributed monolith.
It's fine to have dependencies, the point is two services that need to be deployed at the same time are not independent microservices.
Yes, the user I'm replying to is suggesting that taking on a dependency of a shared software repository makes the service no longer a microservice.
That is fundamentally incorrect. As presented in my other post you can correctly use the shared repository as a dependency and refer to a stable version vs a dynamic version which is where the problem is presented.
The problem with having a shared library which multiple microservices depend on isn’t on the microservice side.
As long as the microservice owners are free to choose what dependencies to take and when to bump dependency versions, it’s fine - and microservice owners who take dependencies like that know that they are obliged to take security patch releases and need to plan for that. External library dependencies work like that and are absolutely fine for microservices to take.
The problem comes when you have a team in the company that owns a shared library, and where that team needs, in order to get their code into production, to prevail upon the various microservices that consume their code to bump versions and redeploy.
That is the path to a distributed monolith situation and one you want to avoid.
Yes we are in agreement. A dependency on an external software repository does not make a microservice no longer a microservice. It's the deployment configuration around said dependency that matters.
"by this level of logic nothing is a micro service"
Yes, exactly. The point is not elitism. Microservices are a valuable tool for a very specific problem but what most people refer to as "microservices" are not. Language is important when designing systems. Microservices are not just a bunch of separately deployable things.
The "micro" in "microservice" doesn't refer to how it is deployed, it refers to how the service is "micro" in responsibility. The service has a public interface defined in a contract that other components depend on, and that is it, what happens within the service is irrelevant to the rest of the system and vice verse, the service does not have depend on knowledge of the rest of the system. By virtue of being micro in responsibility, it can be deployed anywhere and anyhow.
If it is not a microservice, it is just a service, and when it is just a service, it is probably a part of a distributed monolith. And that is okay, a distributed monolith can be very valuable. The reason many people bristle at the mention of microservices is that they are often seen as an alternative to a monolith but they are not, it is a radically different architecture.
We must be precise in our language because if you or I build a system made up of "microservices" that aren't microservices, we're taking on all of the costs of microservices without any of the benefits. You can choose to drive to work, or take the bus, but you cannot choose to drive because it is the cheapest mode of transport or walk because it is the fastest. The costs and benefits are not independent.
The worst systems I have ever worked on were "microservices" with shared libraries. All of the costs of microservices (every call now involves a network) and none of the benefits (every service is dependent on the others). The architect of that system had read all about how great microservices are and understood it to mean separately deployable components.
There is no hierarchy of goodness, we are just in pursuit of the right tool or the job. A monolith, distributed monolith or a microservice architecture could be the right tool for one problem and the wrong tool for another.
https://www.youtube.com/watch?v=y8OnoxKotPQ
> The "micro" in "microservice" doesn't refer to how it is deployed, it refers to how the service is "micro" in responsibility.
The "micro" in microservice was a marketing term to distinguish it from the bad taste of particular SOA technology implementations in the 2000s. A similar type of activity as crypto being a "year 3000 technology."
The irony is it was the common state that "services" weren't part of a distributed monolith. Services which were too big were still separately deployable. When services became nothing but an HTTP interface over a database entity, that's when things became complicated via orchestration; orchestration previously done by a service... not done to a service.
>We must be precise in our language
I am talking about using a shared software repository as a dependency. Which is valid for a microservice. Taking said dependency does not turn a microservice into a monoloth.
It may be a build time dependency that you do in isolation in a completely unrelated microservice for the pure purpose of building and compiling your business microservice. It is still a dependency. You cannot avoid dependencies in software or life. As Carl Sagan said, to bake an apple pie from scratch, you must first invent the universe.
>The worst systems I have ever worked on were "microservices" with shared libraries.
Ok? How is this relevant to my point? I am only referring to the manner in which your microservice is referencing said libraries. Not the pros or cons of implementing or using shared libraries (e.g mycompany-specific-utils), common libraries (e.g apache-commons), or any software component for that matter
>Yes, exactly
So you're agreeing that there is no such thing as a microservice. If that's the case, then the term is pointless other than a description of an aspirational yet unattainable state. Which is my point exactly. For the purposes of the exercise described the software is a microservice.
> Taking said dependency does not turn a microservice into a monoloth.
True. However one of the core tenets of microservices is that they should be independently deployable[1][2].
If taking on such a shared dependency does not interfere with them being independently deployable then all is good and you still have a set of microservices.
However if that shared dependency couples the services so that if one needs a new version of the shared dependency then all do, well you suddenly those services are no longer microservices but a distributed monolith.
[1]: https://martinfowler.com/microservices/
[2]: https://www.oreilly.com/content/a-quick-and-simple-definitio...
And if my grandmother had wheels she would be a bike
There are categories and ontologies are real in the world. If you create one thing and call it something else that doesn’t mean the definition of “something else” should change
By your definition it is impossible to create a state based on coherent specifications because most states don’t align to the specification.
We know for a fact that’s wrong via functional programming, state machines, and formal verification
Needing to upgrade a library everywhere isn’t necessarily a sign of inappropriate coupling.
For example, a library with a security vulnerability would need to be upgraded everywhere regardless of how well you’ve designed your system.
In that example the monolith is much easier to work with.
While you're right, I can only think of twice in my career where there was a "code red all services must update now", which were log4shell and spectre/meltdown (which were a bit different anyway). I just don't think this comes up enough in practice to be worth optimizing for.
You have not been in the field very long than I presume? There's multiple per year that require all hands on deck depending on your tech stack. Just look at the recent NPM supply chain attacks.
You presume very incorrectly to say the least.
The npm supply chain attacks were only an issue if you don't use lock files. In fact they were a great example of why you shouldn't blindly upgrade to the latest packages when they are available.
Fair enough, which is why I called out my assumption:).
I'm referring to the all hands on deck nature of responding to security issues not the best practice. For many, the NPM issue was an all hands on deck.
Wait what? I've been wondering why people have been fussing over supply chain vulnerabilities, but I thought they mostly meant "we don't want to get unlucky and upgrade, merge the PR, test, and build the container before the malicious commit is pushed".
Who doesn't use lockfiles? Aren't they the default everywhere now? I really thought npm uses them by default.
We use pretty much the entire nodejs ecosystem, and only the very latest Next.js vulnerability was an all hands on deck vulnerability. That’s taken over the past 7 years.
You solve a bunch of them by not using javacript in the backend though
To add to this conversation from our other thread, you solve a bunch of problems that are nearly just as bad by not using microservices yet you still do. And that is the same reason why people use JavaScript despite the issues it introduces. It’s not like you’re the only person the industry who hasn’t used a technology that irrationally introduces horrible consequences.
I mean I just participated in a Next JS incident that required it this week.
It has been rare over the years but I suspect it's getting less rare as supply chain attacks become more sophisticated (hiding their attack more carefully than at present and waiting longer to spring it).
NextJS was just bog standard “we designed an insecure API and now everyone can do RCE” though.
Everyone has been able to exploit that for ages. It only became a problem when it was discovered and publicised.
A library which patches a security vulnerability should do so by bumping a patch version, maintaining backward compatibility. Taking a patch update to a library should mean no changes to your code, just rerun your tests and redeploy.
If libraries bump minor or major versions, they are imposing work on all the consuming services to accept the version, make compatibility changes, test and deploy.
Example: log4j. That was an update fiasco everywhere.
1 line change and redeploy
Works great if you are the product owner. We ended up having to fire and replace about a dozen 3rd party vendors over this.
This is pedantic, but no, it doesn't need to be updated everywhere. It should be updated as fast as possible, but there isn't a dependency chain there.
It’s easy to say things like this but also incredibly difficult to know if you’ll introduce subtle bugs or incompatibilities between services. It’s an example of people following the microservices pattern and then being given additional risk or problems deploying that are not immediately obvious when buying into this!
So let’s say you have a shared money library that you have fixed a bug in… what would you do in the real world - redeploy all your services that use said library or something else?
> It’s easy to say things like this but also incredibly difficult to know if you’ll introduce subtle bugs or incompatibilities between services.
You are right: it is difficult. It is harder than building a monolith. No argument there. I just don't think proper microservices are as difficult as people think. It's just more of a mindshift.
Plenty of projects and companies continue to release backwards compatible APIs: operating systems, Stripe/PayPal, cloud providers. Bugs come up, but in general people don't worry about ec2:DescribeInstances randomly breaking. These projects are still evolving internally while maintaining a stable external API. It's a skill, but something that can be learned.
> So let’s say you have a shared money library that you have fixed a bug in… what would you do in the real world - redeploy all your services that use said library or something else?
In the real world I would not have a shared "money library" to begin with. If there were money-related operations that needed to be used by multiple services, I would have a "money service" which exposed an API and could be deployed independently. A bug fix would then be a deploy to this service, and no other services would have to update or be aware of the fix.
This isn't a theoretical, either, as a "payments service" that encapsulates access to payment processors is something I've commonly seen.
> In the real world I would not have a shared "money library" to begin with. If there were money-related operations that needed to be used by multiple services, I would have a "money service" which exposed an API and could be deployed independently.
Depending on what functionality the money service handles, this could become a problem.
For example, one example of a shared library type function I've seen in the past is rounding (to make sure all of the rounding rules are handled properly based on configs etc.). An HTTP call for every single low level rounding operation would quickly become a bottleneck.
But really, a shared "money library" is exactly the same thing as a shared "money service" if everyone is using the same, latest version (which is easier to enforce with a networked "service").
The difference is in what's easy and what's hard. With a library, it's easy for everyone to run a different version, and hard for everyone to run the same version. With a service, it's easy for everyone to use the same version, and harder to use a different one (eg. creating multiple environments, and especially ephemeral "pull request" environments where you can mix and match for best automated integration and e2e testing).
But you can apply the same backwards-compatible API design patterns to a library that you would be applying to a service: no difference really. It's only about what's the time to detection when you break these patterns (with a library, someone finds out 2 years later when they update; with a service, they learn right away).
It’s definitely not the same thing.
From the perspective of change management, what’s the difference between a shared library and an internal service relied on by multiple other services?
You still need to make sure changes don’t have unintended consequences downstream
Latency.
I was coming here to say this. That the whole idea of a shared library couples all those services together. Sounds like someone wanted to be clever and then included their cleverness all over the platform. Dooming all services together.
Decoupling is the first part of microservices. Pass messages. Use json. I shouldn’t need your code to function. Just your API. Then you can be clever and scale out and deploy on saturdays if you want to and it doesn’t disturb the rest of us.
> Pass messages. Use json. I shouldn’t need your code to function. Just your API.
Yes, but there’s likely a lot of common code related to parsing those messages, interpreting them, calling out to other services etc. shared amongst all of them. That’s to be expected. The question is how that common code is structured if everything has to get updated at once if the common code changes.
Common code that’s part of your standard library, sure. Just parse the json. Do NOT introduce some shared class library that “abstracts” that away. Instead use versioning of schemas like another commenter said. Use protobuf. Use Avro. Use JSON. Use Swagger. Use something other than POCO/POJO shared library that you have to redeploy all your services because you added a Boolean to the newsletter object.
So, depending on someone else’s shared library, rather than my own shared library, is the difference between a microservice and not a microservice?
This right here. WTF do you do when you need to upgrade your underlying runtime such as Python, Ruby, whatever ¯\_(ツ)_/¯ you gotta go service by service.
If needs be. Or, you upgrade the mission critical ones and leave the rest for when you pick them up again. If your culture is “leave it better than when you found it” this is a non issue.
The best is when you use containers and build against the latest runtimes in your pipelines so as to catch these issues early and always have the most up to date patches. If a service hasn’t been updated or deployed in a long time, you can just run another build and it will pull latest of whatever.
The opposite situation of needing to upgrade your entire company's codebase all at once is much more painful. With services you can upgrade runtimes on an as-needed basis. In monoliths, runtime upgrades were massive projects that required a ton of coordination between teams and months or years of work.
Fair point.
one way is by using schemas to communicate between them that are backwards compatible. eg with avro its quite nice
But the you're outsourcing the same shared code problem to a third party shared library. It fundamentally doesn't go away.
That 3rd party library rarely gets updated whereas Jon’s commit adds a field and now everyone has to update or the marshaling doesn’t work.
Yes, there are scenarios where you have to deploy everything but when dealing with micro services, you should only be deploying the service you are changing. If updating a field in a domain affects everyone else, you have a distributed monolith and your architecture is questionable at best.
The whole point is I can deploy my services without relying on yours, or touching yours, because it sounds like you might not know what you’re doing. That’s the beautiful effect of a good micro service architecture.
I was trying to think of better terminology. Perhaps this works:
Two services can have a common dependency, which still leaves them uncoupled. An example would be a JSON schema validation and serialization/deserialization library. One service can in general bump its dependency version without the other caring, because it'll still send and consume valid JSON.
Two services can have a shared dependency, which couples them. If one service needs to bump its version the other must also bump its version, and in general deployment must ensure they are deployed together so only one version of the shared dependency is live, so to speak. An example could be a library containing business logic.
If you had two independent microservices and added a shared library as per my definition above, you've turned them into a distributed monolith.
Sometimes a common dependency might force a shared deployment, for example a security bug in the JSON library. However that is an exception, and unlike the business logic library. In the shared library case the exception is that one could be bumped without the other caring.
The third party shared library doesn't know your company exists. This means the third party dependency doesn't contain any business or application specific code and is applicable to any software project. This in turn means it has to solve the majority of business use cases ahead of time and be thoroughly tested to not break any consumers.
The problem has fundamentally gone away and reduced itself to a simple update problem, which itself is simpler because the update schedule is less frequent.
I use tomcat for all web applications. When tomcat updates I just need to bump the version number on one application and move on to the next. Tomcat does not involve itself in the data that is being transferred in a non-generic way so I can update whenever I want.
Since nothing blocks updates, the updates happen frequently which means no application is running on an ancient tomcat version.
Yeah this seems very much not a microservices setup.
I don't pretend proper microservices are a magic solution... but if you break the rules / system if microservices, that's not "microsercices" being bad, that's just creating problems for yourself.
So you should re-write your logging code on each and every one of your 140+ services vs. leverage a shared module?
You can keep using an older version for a while. You shouldn't need to redeploy everything at once. If you can't keep using the older version, you did it wrong.
And ideally, your logging library should rarely need to update. If you need unique integrations per service, use a plug-in architecture and keep the plug-ins local to each service.
I wasn't taking into account velocity of fleet-wide rollout, as I agree, you can migrate over time however. however, I was focusing on the idea that anytime of fleet wide rollout for a specific change was somehow "bad."
While I think that's a bit harsh :-) the sentiment of "if you have these problems, perhaps you don't understand systems architecture" is kind of spot on. I have heard people scoff at a bunch of "dead legacy code" in the Windows APIs (as an example) without understanding the challenge of moving millions of machines, each at different places in the evolution timeline, through to the next step in the timeline.
To use an example from the article, there was this statement: "The split to separate repos allowed us to isolate the destination test suites easily. This isolation allowed the development team to move quickly when maintaining destinations."
This is architecture bleed through. The format produced by Twilio "should" be the canonical form, which is submitted the adapter which mangles it to the "destination" form. Great, that transformation is expressible semantically in a language that takes the canonical form and spits out the special form. Changes to the transformation expression should not "bleed through" to other destinations, and changes to the canonical form should be backwards compatible to prevent bleed through of changes in the source from impacting the destination. At all times, if something worked before, it should continue to work without touching it because the architecture boundaries are robust.
Being able to work with a team that understood this was common "in the old days" when people were working on an operating system. The operating system would evolve (new features, new devices, new capabilities) but because there was a moat between the OS and applications, people understood that they had to architect things so that the OS changes would not cause applications that currently worked to stop working.
I don't judge Twilio for not doing robust architecture, I was astonished when I went to work of Google how lazy everyone got when the entire system is under their control (like there are no third party apps running in the fleet). The was a persistent theme of some bright person "deciding" to completely change some interface and Wham! every other group at Google had to stop what they were doing and move their code to the new thing. There was a particularly poor 'mandate' on a new version of their RPC while I was there. As Twilio notes, that can make things untenable.
Agreed. It sounds like they never made it to the distributed architecture they would have benefited from. That said, if the team thrives on a monolithic one they made the right choice.
Then every microservice network in existence is a distributed monolith so long as they communicate with one another.
If you communicate with one another you are serializing and deserializing a shared type. That shared type will break at the communication channels if you do not simultaneously deploy the two services. The irony is to prevent this you have to deploy simultaneously and treat it as a distributed monolith.
This is the fundamental problem of micro services. Under a monorepo it is somewhat more mitigated because now you can have type checking and integration tests across multiple repos.
Make no mistake the world isn’t just library dependencies. There are communication dependencies that flow through communication channels. A microservice architecture by definition has all its services depend on each other through this communication channels. The logical outcome of this is virtually identical to a distributed monolith. In fact shared libraries don’t do much damage at all if the versions are off. It is only shared types in the communication channels that break.
There is no way around this unless you have a mechanism for simultaneous merging code and deploying code across different repos which breaks the definition of what it is to be a microservice. Microservices always and I mean always share dependencies with everything they communicate with. All the problems that come from shared libraries are intrinsic to microservices EVEN when you remove shared libraries.
People debate me on this but it’s an invariant.
I believe in the original amazon service architecture, that grew into AWS (see “Bezos API mandate” from 2002), backwards compatibility is expected for all service APIs. You treat internal services as if they were external.
That means consumers can keep using old API versions (and their types) with a very long deprecation window. This results in loose coupling. Most companies doing microservices do not operate like this, which leads to these lockstep issues.
Yeah. that's a bad thing right? Maintaining backward compatibility to the end of time in the name of safety.
I'm not saying monoliths are better then microservices.
I'm saying for THIS specific issue, you will not need to even think about API compatibility with monoliths. It's a concept you can throw out the window because type checkers and integration tests catch this FOR YOU automatically and the single deployment insures that the compatibility will never break.
If you choose monoliths you are CHOOSING for this convenience, if you choose microservices you are CHOOSING the possibility for things to break and AWS chose this and chose to introduce a backwards compatibility restriction to deal with this problem.
I use "choose" loosely here. More likely AWS ppl just didn't think about this problem at the time. It's not obvious... or they had other requirements that necessitated microservices... The point is, this problem in essence is a logical consequence of the choice.
> Yeah. that's a bad thing right? Maintaining backward compatibility to the end of time in the name of safety.
This this is what I don't get about some comments in this thread. Choosing internal backwards compatibility for services managed by a team of three engineers doesn't make a lot of sense to me. You (should) have the organizational agility to make big changes quickly, not a lot of consensus building should be required.
For the S3 APIs? Sure, maintaining backwards compatibility on those makes sense.
Backwards compatibility is for customers. If customers don’t want to change apis… you provide backwards compatibility as a service.
If you’re using backwards compatibility as safety and that prevents you from doing a desired upgrade to an api that’s an entirely different thing. That is backwards compatibility as a restriction and a weakness in the overall paradigm while the other is backwards compatibility as a feature. Completely orthogonal actions imo.
> or they had other requirements that necessitated microservices
Scale
Both in people, and in "how do we make this service handle the load". A monolith is easy if you have few developers and not a lot of load.
With more developers it gets hard as they start affecting eachother across this monolith.
With more load it gets difficult as the usage profile of a backend server becomes very varied and performance issues hard to even find. What looks like a performance loss in one area might just be another unrelated part of the monolith eating your resources,
Exactly, performance can make it necessary to move away from a monolith.
But everyone should know that microservices are more complex systems and harder to deal with and a bunch of safety and correctness issues that come with it as well.
The problem here is not many people know this. Some people think going to microservices makes your code better, which I’m clearly saying here you give up safety and correctness as a result)
> If you communicate with one another you are serializing and deserializing a shared type.
Yes, this is absolutely correct. The objects you send over the wire are part of an API which forms a contract the server implementing the API is expected to provide. If the API changes in a way which is backwards compatible, this will break things.
> That shared type will break at the communication channels if you do not simultaneously deploy the two services.
This is only true if you change the shared type in a way which is not backwards compatible. One of the major tenets of services is that you must not introduce backwards incompatible changes. If you want to make a fundamental change, the process isn't "change APIv1 to APIv2", it's "deploy APIv2 alongside APIv1, mark APIv1 as deprecated, migrate clients to APIv2, remove APIv1 when there's no usage."
This may seem arduous, but the reality is that most monoliths already deal with this limitation! Don't believe me? Think about a typical n-tier architecture with a backend that talks to a database; how do you do a naive, simple rename of a database column in e.g. MySQL in a zero-downtime manner? You can't. You need to have some strategy for dealing with the backwards incompatibility which exists when your code and your database do not match. The strategy might be a simple add new column->migrate code->remove old column, including some thought on how to deal with data added in the interim. It might be to use views. It might be some insane strategy of duplicating the full stack, using change data capture to catch changes and flipping a switch.[0] It doesn't really matter, the point is that even within a monolith, you have two separate services, a database and a backend server, and you cannot deploy them truly simultaneously, so you need to have some strategy for dealing with that; or more generally, you need to be conscious of breaking API changes, in exactly the same way you would with independent services.
> The logical outcome of this is virtually identical to a distributed monolith.
Having seen the logical outcome of this at AWS, Hootsuite, Splunk, among others: no this isn't true at all really. e.g. The RDS team operated services independently of the EC2 team, despite calling out to EC2 in the backend; in no way was it a distributed monolith.
[0] I have seen this done. It was as crazy as it sounds.
Managing two services is very different than managing 140. And databases have a lot of tooling, support, and documentation around migrations.
>This is only true if you change the shared type in a way which is not backwards compatible. One of the major tenets of services is that you must not introduce backwards incompatible changes. If you want to make a fundamental change, the process isn't "change APIv1 to APIv2", it's "deploy APIv2 alongside APIv1, mark APIv1 as deprecated, migrate clients to APIv2, remove APIv1 when there's no usage."
Agreed and this is a negative. Backwards compatibility is a restriction made to deal with something fundamentally broken.
Additionally eventually in any system of services you will have to make a breaking change. Backwards compatibility is a behavioral comping mechanism to deal with a fundamental issue of microservices.
>This may seem arduous, but the reality is that most monoliths already deal with this limitation! Don't believe me? Think about a typical n-tier architecture with a backend that talks to a database; how do you do a naive, simple rename of a database column in e.g. MySQL in a zero-downtime manner? You can't. You need to have some strategy for dealing with the backwards incompatibility.
I believe you and am already aware. It's a limitation that exists intrinsically so it exists because you have No choice. A database and a monolith needs to exist as separate services. The thing I'm addressing here is the microservices and monolith debate. If you choose microservices, you are CHOOSING for this additional problem to exist. If you choose monolith, then within that monolith you are CHOOSING for those problems to not exist.
I am saying regardless of the other issues with either architecture, this one is an invariant in the sense that for this specific thing, monolith is categorically better.
>Having seen the logical outcome of this at AWS, Hootsuite, Splunk, among others: no this isn't true at all really. e.g. The RDS team operated services independently of the EC2 team, despite calling out to EC2 in the backend; in no way was it a distributed monolith.
No you're categorically wrong. If they did this in ANY of the companies you worked at then they are Living with this issue. What I'm saying here isn't an opinion. It is a theorem based consequence that will occur IF all the axioms are satisfied: namely >2 services that communicate with each other and ARE not deployed simultaneously. This is logic.
The only way errors or issues never happened with any of the teams you worked with is if the services they were building NEVER needed to make a breaking change to the communication channel, or they never needed to communicate. Neither of these scenarios is practical.
> The only way errors or issues never happened with any of the teams you worked with is if the services they were building NEVER needed to make a breaking change to the communication channel, or they never needed to communicate. Neither of these scenarios is practical.
IMO the fundamental point of disagreement here is that you believe it is effectively impossible to evolve APIs without breaking changes.
I don't know what to tell you other than, I've seen it happen, at scale, in multiple organizations.
I can't say that EC2 will never made a breaking change that causes RDS, lambda, auto-scaling to break, but if they do, it'll be front page news.
>IMO the fundamental point of disagreement here is that you believe it is effectively impossible to evolve APIs without breaking changes.
No certainly possible. You can evolve linux, macos and windows forever without any breaking changes and keep all apis backward compatible for all time. Keep going forever and ever and ever. But you see there's a huge downside to this right? This downside becomes more and more magnified as time goes on. In the early stages it's fine. And it's not like this growing problem will stop everything in it's tracks. I've seen organizations hobble along forever with increasing tech debt that keeps increasing for decades.
The downside won't kill an organization. I'm just saying there is a way that is better.
>I don't know what to tell you other than, I've seen it happen, at scale, in multiple organizations.
I have as well. It doesn't mean it doesn't work and can't be done. For example typescript is better than javascript. But you can still build a huge organization around javascript. What I'm saying here is one is intrinsically better than the other but that doesn't mean you can't build something on technology or architectures that are inferior.
And also I want to say I'm not saying monoliths are better than microservices. I'm saying for this one aspect monoliths are definitively better. There is no tradeoff for this aspect of the debate.
>I can't say that EC2 will never made a breaking change that causes RDS, lambda, auto-scaling to break, but if they do, it'll be front page news.
Didn't a break happen recently? Barring that... There's behavioral ways to mitigate this right? like what you mentioned... backward compatible apis always. But it's better to set up your system such that the problem just doesn't exist Period... rather then to set up ways to deal with the problem.
> The only way errors or issues never happened with any of the teams you worked with is if the services they were building NEVER needed to make a breaking change to the communication channel, or they never needed to communicate.
This is correct.
> Neither of these scenarios is practical.
This is not. When you choose appropriate tools (protobuf being an example), it is extremely easy to make a non-breaking change to the communication channel, and it is also extremely easy to prevent breaking changes from being made ever.
I don't agree.
Protobuf works best if you have a monorepo. If each of your services lives within it's own repo then upgrades to one repo can be merged onto the main branch that potentially breaks things in other repos. Protobuf cannot check for this.
Second the other safety check protobuf uses is backwards compatibility. But that's a arbitrary restriction right? It's better to not even have to worry about backwards compatability at all then it is to maintain it.
Categorically these problems don't even exist in the monolith world. I'm not taking a side in the monolith vs. microservices debate. All I'm saying is for this aspect monoliths are categorically better.
You usually can't simultaneously deploy two services. You can try, but in a non trivial environment there are multiple machines and you'll want a rolling upgrade, which causes an old client to talk to a new service or vice versa. Putting the code into a monorepo does nothing to fix this.
This is much less of a problem than it seems.
You can use a serialisation format that allows easy backward compatible additions. The new service that has a new feature adds a field for it. The old client, responsibly coded, gracefully ignores the field it doesn't understand.
You can version the API to allow for breaking changes, and serve old clients old responses, and new clients newer responses. This is a bit of work to start and sometimes overkill, given the first point
If you only need very rare breaking changes, you can deploy new-version-tolerant clients first, then when that's fully done, deploy the new-version service. It's a bit of faff, but if it's very rare and internal, it's often easier than implementing full versioning.
> You usually can't simultaneously deploy two services
Yeah it’s roundabout solution to create something to deploy two things simultaneously. Agreed.
> Putting the code into a monorepo does nothing to fix this.
It helps mitigate the issue somewhat. If it was a polyrepo you suffer from an identical problem with the type checker or the integration test. The checkers basically need all services to be at the same version to do a full and valid check so if you have different teams and different repos the checkers will never know if team A made a breaking change that will effect team B because the integration test and type checker can’t stretch to another repo. Even if it could stretch to another repo you would need to do a “simultaneous” merge… in a sense polyrepos suffer from the same issue as microservices on the CI verification layer.
So if you have micro services and you have a polyrepos you are suffering from a twofold problem. Your static checks and integration tests are never correct and always either failing and preventing you from merging or deliberately crippled so as to not validate things across repos. At the same time your deploys will also guarantee to be broken if a breaking api change is made. You literally give up safety in testing, safety in type checking and working deploys by going microservices and polyrepos.
Like you said it can be fixed with backward comparability but that’s a bad thing to restrict your code to be that way.
> This is much less of a problem than it seems.
It is not “much less of a problem then it seems” because big companies have developed methods to do simultaneous deploys. See Netflix. If they took the time to develop a solution it means it’s not a trivial issue.
Additionally are you aware of any api issues in communication between your local code in a single app? Do you have any problems with this such that you are aware of it and come up with ways to deal with it? No. In a monolith the problem is nonexistent and it doesn’t even register. You are not aware this problem exists until you move to micro-services. That’s the difference here.
> You can use a serialisation format that allows easy backward compatible additions.
Mentioned a dozen times in this thread. Backwards compatibility is a bad thing. It’s a restriction that freezes all technical debt into your code. Imagine python 3 stayed backward compatible with 2 or the current version of macOS was still compatible with binaries from the first Mac.
> You can version the API to allow for breaking changes, and serve old clients old responses, and new clients newer responses. This is a bit of work to start and sometimes overkill, given the first point
Can you honestly tell me this is a good thing? The fact that you have to pay attention to this in microservices while in a monolith you don’t even need to be aware there’s an issue tells you all you need to know. You’re just coming up with behavioral work around and coping mechanisms to make microservices work in this area. You’re right it does work. But it’s a worse solution for this problem then monoliths which doesn’t have these work arounds because these problems don’t exist in monoliths.
> If you only need very rare breaking changes, you can deploy new-version-tolerant clients first, then when that's fully done, deploy the new-version service. It's a bit of faff, but if it's very rare and internal, it's often easier than implementing full versioning.
It’s only very rare in microservices because it’s weaker. You deliberately make it rare because of this problem. Is it rare to change a type in a monolith? No. Happens on the regular. See the problem? You’re not realizing but everything you’re bringing up is behavioral actions to cope with an aspect that is fundamentally weaker in microservices.
Let me conclude to say that there are many reasons why microservices are picked over monoliths. But what we are talking about here is definitively worse. Once you go microservices you are giving up safety and correctness and replacing it with work arounds. There is no trade off for this problem it is a logical consequence of using microservices.
Of course things are easier if you can run all your code in one binary on one machine, without remote users or any need to scale.
As soon as you add users you need to start coping with backwards compatibility though, even if your backend is still a monolith.
The backend being a monolith is probably easier for a while yes, but I've also lost count of the number of companies I've been to who have been in the painful process of breaking apart their monolith, because it doesn't scale and is hard to work with
Microservices or SOA aren't trivial, but the problems you bring up as extremely hard are pretty easy to deal with once you know how; and it buys you independent deploys and scale, which are pretty useful things at a bigger place
> That shared type will break at the communication channels if you do not simultaneously deploy the two services.
No. Your shared type is too brittle to be used in microservices. Tools like the venerable protobuf has solved this problem decades ago. You have a foundational wire format that does not change. Then you have a schema layer that could change in backwards compatible ways. Every new addition is optional.
Here’s an analogy. Forget microservices. Suppose you have a monolithic app and a SQL database. The situation is just like when you change the schema of the SQL database: of course you have application code that correctly deals with both the previous schema and the new schema during the ALTER TABLE. And the foundational wire format that you use to talk to the SQL database does not change. It’s at a layer below the schema.
This is entirely a solved problem. If you think this is a fundamental problem of microservices, then you do not grok microservices. If you think having microservices means simultaneous deployments, you also do not grok microservices.
False. Protobuf solves nothing.
1. Protobuf requires a monorepo to work correctly. Shared types must be checked across all repos and services simulateneously. Without a monorepo or some crazy work around mechanism this won't work. Think about it. These type checkers need everything at the same version to correctly check everything.
2. Even with a monorepo, deployment is a problem. Unless you do simultaneous deploys if one team upgrades there service and another team doesn't the Shared type is incompatible simply because you used microservices and polyrepos to allow teams to move async instead of insync. It's a race condition in distributed systems and it's theoremtically true. Not solved at all because it can't be solved by logic and math.
Just kidding. It can be solved but you're going to have to change definitions of your axioms aka of what is currently a microservice, monolith, monorepo and polyrepo. If you allow simultaneous deploys or pushes to microservices and polyrepos these problems can be solved but then can you call those things microservices or polyrepos? They look more like monorepos or monoliths... hmmm maybe I'll call it "distributed monolith".... See we are hitting this problem already.
>Here’s an analogy. Suppose you have a monolithic app and a SQL database. The situation is just like when you change the schema of the SQL database: of course you have application code that correctly deals with the previous schema and the new schema during the ALTER TABLE. And the foundational wire format that you use to talk to the SQL database does not change. It’s at a layer below the schema.
You are just describing the problem I provided. We call "monoliths" monoliths but technically a monolith must interact with a secondary service called a database. We have no choice in the matter. The monolith and microservice of course does not refer to that problem which SUFFERS from all the same problems as microservices.
>This is entirely a solved problem. If you think this is a fundamental problem of microservices, then you do not grok microservices. If you think having microservices means simultaneous deployments, you also do not grok microservices.
No it's not. Not at all. It's a problem that's lived with. I have two modules in a monolith. ANY change that goes into the mainline branch or deploy is type checked and integration tested to provide maximum safety as integration tests and type checkers can check the two modules simultaneously.
Imagine those two modules as microservices. Because they can be deployed at any time asynchronously, because they can be merged to the mainline branch at any time asynchronously They cannot be type checked or integration tested. Why? If I upgrade A which requires an upgrade to B but B is not upgraded yet, How do I type check both A and B at the same time? Axiomatically impossible. Nothing is solved. Just behavioral coping mechanisms to deal with the issue. That's the key phrase: behavioral coping mechanisms as opposed to automated statically checked safety based off of mathematical proof. Most of the arguments from your side will be consisting of this: "behavioral coping mechanisms"
> Then you have a schema layer that could change in backwards compatible ways. Every new addition is optional.
Also known as the rest of the fucking owl. I am entirely in factual agreement with you, but the number of people who are even aware they maintain an API surface with backwards compatibility as a goal, let alone can actually do it well, are tiny in practice. Especially for internal services, where nobody will even notice violations until it’s urgent, and at such a time, your definitions won’t save you from blame. Maybe it should, though. The best way to stop a bad idea is to follow it rigorously and see where it leads.
I’m very much a skeptic of microservices, because of this added responsibility. Only when the cost of that extra maintenance is outweighed by overwhelming benefits elsewhere, would I consider it. For the same reason I wouldn’t want a toilet with a seatbelt.
Bingo. Couldn't agree more. The other posters in this comment chain seem to view things from a dogmatic approach vs a pragmatic approach. It's important to do both, but individuals should call out when they are discussing something that is practiced vs preached.
If you've run a microservice stack or N at scale with good results, someone saying it's impossible doesn't look pragmatic
I’m not commenting on the pragmatic part.
My thesis is logical and derived from axioms. You will have fundamental incompatibilities between apis between services if one service changes the api. That’s a given. It’s 1 + 1 =2.
Now I agree there are plenty of ways to successfully deal with these problems like api backwards compatibility, coordinated deploys… etc… etc… and it’s a given thousands of companies have done this successfully. This is the pragmatic part, but that’s not ultimately my argument.
My argument is none of the pragamatisms and methodologies to deal with those issues need to exist in a monolithic architecture because the problem itself doesn’t exist in a monolith.
Nowhere did I say microservices can’t be successfully deployed. I only stated that there are fundamental issues with microservices that by logic must occur definitionally. The issue is people are biased. They tie their identity to an architecture because they advocated it for too long. The funniest thing is that I didn’t even take a side. I never said microservices were better or worse. I was only talking about one fundamental problem with microservices. There are many reasons why microservices are better but I just didn’t happen to bring it up. A lot of people started getting defensive and hence the karma.
Agreed. What I’m describing here isn’t solely pragmatic it’s axiomatic as well. If you model this as a distributed system with graph all microservices by definition will always reach a state where the apis are broken.
Most microservice companies either live with the fact or they have round about ways to deal with it including simultaneous deploys across multiple services and simultaneous merging, CI and type checking across different repos.
Once all the code for the services lived in one repo there was nothing preventing them from deploying the thing 140 times. I’m not sure why they act like that wasn’t an option.
If there’s any shared library across all your services, even a third party library, if that library has a security patch you now need to update that shared library across your entire service fleet. Maybe you don’t have that; maybe each service is written in a completely different programming language, uses a different database, and reimplements monitoring in a totally different way. In that case you have completely different problems.
Everyone needing to update due to a security thing happens infrequently. Otherwise, coding and deploying may be happening multiple times a day.
We have had shared libraries. Teams updated to them when they next wanted to. When it was important, on call people made it happen asap. Zero issues.
Monorepos reasonably well designed and fleixble to grow with you can increase development speed quite a bit.
Imagine your services were built on react-server-* components or used Log4J logging.
This is simply dependency hell exploding with microservices.
100%. It's almost like they jumped into it not understanding what they were signing up for.
> If you must to deploy every service because of a library change
Hello engineer. Jira ticket VULN-XXX had been assigned to you as your team's on call engineer.
A critical vulnerability has been found in the netxyz library. Please deploy service $foo after SHA before 2025-12-14 at 12:00 UTC.
Hello engineer. Jira ticket VULN-XXX had been assigned to you as your team's on call engineer.
A critical vulnerability has been found in the netxyz library. Please deploy service $bar after SHA before 2025-12-14 at 12:00 UTC.
...
It's never ending. You get a half dozen of these on each on call rotation.
My experience doesn't align with yours. I worked at SendGrid for over a decade and they were on the (micro) service train. I was on call for all dev teams on a rotation for a couple of years and later just for my team.
I have seen like a dozen security updates like you describe.
This was at a fintech and we took every single little vuln with the utmost priority. Triaged by severity of course, but everything had a ticking clock.
We didn't just have multiple security teams, we had multiple security orgs. If you didn't stay in compliance with VULN SLAs, you'd get a talking to.
We also had to frequently roll secrets. If the secrets didn't support auto-rotation, that was also a deployment (with other steps).
We also had to deploy our apps if they were stale. It's dangerous not to deploy your app every month or two, because who knows if stale builds introduced some kind of brittleness? Perhaps a change to some net library you didn't deploy caused the app not to tolerate traffic spikes. And it's been six months and there are several such library changes.
I don't know what a call rotation is, but I keep getting email flooded by half a dozen Linux vulnerabilities every day and it's getting old.
In my previous company we did everything as a micro service. In the company before that it was serverless on AWS!
In both cases we had to come up with clever solutions to simply get by because communication between services is of a problem. It is difficult (not impossible) to keep all the contracts in sync and deployment has to be coordinated in a very specific way sometimes. The initial speed you get is soon lost further down the path due to added complexities. There was fear-driven development at play. Service ownership is a problem. Far too much meetings are spent on coordination.
In my latest company everything is part of the same monolith. Yes the code is huge but it is so much easier to work with. We use a lot more unit tests then integration tests. Types make sense. Refactoring is just so easy. All the troubleshooting tools including specialised AI agents built on top of our own platform are part of the code-base which is kind of interesting because I can see how this is turning into a self-improving system. It is fascinating!
We are not planning to break up the monolith unless we grow so much that is impossible to manage from a single git repository. As far as I can tell this may never happen as it is obvious that much larger projects are perfectly well maintained in the exact same way.
The only downside is that build takes longer but honestly we found ways around that as well in the past and now with further improvements in the toolchains delivered by the awesome open-source communities around the world, I expect to see at least 10x improvement in deployment time in 2026.
Overall, in my own assessment, the decision to go for a monolith allowed us to build and scale much faster than if we had used micro services.
I hope this helps.
My experience is the opposite. I worked at SendGrid for a decade and we scaled the engineering org from a dozen to over 500 operating at scale sending billions of messages daily. Micro services. Well, services. The word micro messes people up.
I have also worked at half a dozen other shops with various architectures.
In every monolith, people violate separation of concerns and things are tightly coupled. I have only ever seen good engineering velocity happen when teams are decoupled from one another. I have only seen this happen in a (micro) service architecture.
Overall, in my own assessment, the decision to stick with a monolith has slowed down velocity and placed limits on scale at every other company I have been at and require changes towards decoupled services to be able to ship with any kind of velocity.
The place I just left took 2 years, over 50 teams, and over 150 individual contributors to launch a product that required us to move over an interface for sending messages from ORM querysets to DTOs. We needed to unlock our ability to start rearchitecting the modules because before it was impossible to know the actual edges of the system and how it used the data. This was incredibly expensive and hard and would never have happened but for the ability to reach into other's domains and making assumptions making things hard.
Don't couple systems. Micro services are the only arch I have seen successfully do this.
You say that every monolith you’ve seen has devolved into bad engineering — coupling, crossing boundaries. What was missing that could have stopped this? A missing service boundary you’d say, but also a lack of engineering leadership or lack of others’ experience? No code review? A commercial non-founder CEO pushing for results at the expense of codebase serviceability? Using a low-ceremony language (no types, no interfaces)?
You can stop coupling by enforcing boundaries. Repository boundaries are extremely solid ones. Too solid for some people, making it unnecessarily hard to coordinate changes on either side of the boundary. Barely solid enough for others, where it’s clearly too dangerous to let their devs work without solid brick walls keeping their hackers apart.
Coupling, smudged boundaries, and incoherence are symptoms of something more fundamental than simply we didn’t use services like we should have. If everyone’s getting colds off each other it’s because of bad hygiene in the office. Sure, you could force them to remain in their cubicles or stay at home but you could also teach them to wash their hands!
Generalizations don't help almost any discussion. Even if that's 100% of what you ever saw, many people have seen mixed (or entirely in the other extreme).
> In every monolith, people violate separation of concerns and things are tightly coupled. I have only ever seen good engineering velocity happen when teams are decoupled from one another. I have only seen this happen in a (micro) service architecture.
I would write this off as indifferent or incompetent tech leadership. Even languages that people call obscure -- like Elixir that I mostly work with -- have excellent libraries and tools that can help you draw enforceable boundaries. Put that in CI or even in pre-commit hook. Job done.
Why was that never done?
Of course people will default to the easier route. It's on tech leadership to keep them to higher standards.
Funny you mention Elixir. At one company, we passed around Ecto querysets. It started when the company was smaller. Then someone needed a little bit of analytics. And a few years of organic growth later and the system was bogged down. Queries where joining all over the place, and separating out the analytics from everything else was, again, a major undertaking.
I would love to see a counter example in real life at an org with over a dozen teams. A well working monolith and a well working monorepo are like unicorns; I don't believe they exist and everyone is talking about and trying to sell their mutant goat as one.
I am not selling you anything so you're starting off from a wrong premise.
What I said is that you should consider your experience prone to a bubble environment and as such it's very anecdotal. So is mine (a mix), granted. Which only means that neither extreme dominates out there. Likely a normal bell curve distribution.
What I did say (along with others) was that a little bit of technical discipline -- accentuating on "little" here -- nullifies the stated benefits of microservice architecture.
And it seems to me that the microservice architecture was chosen to overcome organizational problems, not technical ones.
Reading it with hindsight, their problems have less to do with the technical trade off of micro or monolith services and much more to do with the quality and organizational structure of their engineering department. The decisions and reasons given shine a light on the quality. The repository and test layout shine a light on the structure.
Given the quality and the structure neither approach really matters much. The root problems are elsewhere.
My observation is that many teams lack strong "technical discipline"; someone that says "no, don't do that", makes the case, and takes a stand. It's easy to let the complexity genie out of the bottle if the team doesn't have someone like this with enough clout/authority to actually make the team pause.
I think the problem is that this microservices vs monolith decision is a really hard one to convince people of. I made a passionate case for ECS instead of lambda for a long time, but only after the rest of the team and leadership see the problems the popular strategy generates do we get something approaching uptake (and the balance has already shifted to kubernetes instead, which is at least better)
There's a lot of good research and writing on this topic. This paper, in particular has been really helpful for my cause: https://dl.acm.org/doi/pdf/10.1145/3593856.3595909
It has a lot going for it: 1) it's from Google, 2) it's easy to read and digest, 3) it makes a really clear case for monoliths.
I 100% agree with you but also sad fact is that it’s easy to understand why people don’t want to take this role. You can make enemies easily, you need to deliver “bad news” and convince people to put more effort or prove that effort they did was not enough. Why bother when you probably won’t be the one that have to clean it up
Ha! I wish I worked at the places you have worked!
>the quality and organizational structure of their engineering department
You're not kidding. I had to work with twilio on a project and it was awful. Any time there was an issue with the API, they'd never delve into why that issue had happened. They'd simply fix the data in their database and close the ticket. We'd have the same issue over and over and over again and they'd never make any effort to fix the cause of the problems.
This is probably the first time I’ve seen a human use the word “delve”.
It immediately triggered my - is this AI?
Maybe you just don't read many books written after the year 2000? It was a pretty common word even before ChatGPT: https://books.google.com/ngrams/graph?content=delve&year_sta...
But perhaps the most famous source is Tolkien: "The Dwarves tell no tale; but even as mithril was the foundation of their wealth, so also it was their destruction: they delved too greedily and too deep, and disturbed that from which they fled, Durin's Bane."
As a non-native speaker, I read a lot of fantasy and science fiction books in English. I use "delve" regularly (I wouldn't say "frequently" though). Not sure if it's Terry Pratchett's Discworld influence, but plenty of archaic sounding words there.
I did not even know it was considered uncommon and archaic, tbh.
I guess there may be regional differences but delve is a commonly used word for native speakers.
People from different countries especially where English is not their first language often have more esoteric words in their vocabulary.
Conway's Law shines again!
It's amazing how much explanatory power it has, to the point that I can predict at least some traits about a company's codebase during an interview process, without directly asking them about it.
In this case, the more applicable are:
1. "Peter principle": "people in a hierarchy and organizations tend to rise to 'a level of respective incompetence' "
2. "Parkinson's law": "Work expands to fill the available time".
So people are filling all the available the time and working tirelessly to reach their personal and organizational levels of incompetency; working hard without stopping to think if what they are doing should be done at all. And nobody is stopping them, nobody asks why (with the real analysis of positives, negatives, risks).
Incompetent + driven is the worst combination there can be.
A few thoughts: this is not really a move to a monolith. Their system is still a SOA (service-oriented architecture), just like microservices (make services as small as they can be), but with larger scope.
Having 140 services managed by what sounds like one team reinforces another point that I believe should be well known by now: you use SOAs (incuding microservices) to scale teams, and not services.
Eg. if a single team builds a shared library for all the 140 microservices and needs to maintain them, it's going to become very expensive quickly: you'll be using v2.3.1 in one service and v1.0.8 in another, and you won't even know yourself what API is available. Operationally, yes, you'll have to watch over 140 individual "systems" too.
There are ways to mitigate this, but they have their own trade-offs (I've posted them in another comment).
As per Conway's law, software architecture always follows the organizational structure, and this seems to have happened here: a single team is moving away from unneeded complexity to more effectively manage their work and produce better outcomes for the business.
It is not a monolith, but properly-scoped service level (scoped to the team). This is, in my experience, the sweet spot. A single team can run and operate multiple independent services, but with growth in those services, they will look to unify, so you need to restructure the team if you don't want that to happen. This is why I don't accept "system architect" roles as those don't give you the tools to really drive the architecture how it can be driven, and I really got into "management" :)
I am _not_ a microservices guy (like... at all) but reading this the "monorepo"/"microservices" false dichotomy stands out to me.
I think way too much tooling assumes 1:1 pairings between services and repos (_especially_ CI work). In huge orgs Git/whatever VCS you're using would have problems with everything in one repo, but I do think that there's loads of value in having everything in one spot even if it's all deployed more or less independently.
But so many settings and workflows couple repos together so it's hard to even have a frontend and backend in the same place if both teams manage those differently. So you end up having to mess around with N repos and can't send the one cross-cutting pull request very easily.
I would very much like to see improvements on this front, where one repo could still be split up on the forge side (or the CI side) in interesting ways, so review friction and local dev work friction can go down.
(shorter: github and friends should let me point to a folder and say that this is a different thing, without me having to interact with git submodules. I think this is easier than it used to be _but_)
I worked on building this at $PREV_EMPLOYER. We used a single repo for many services, so that you could run tests on all affected binaries/downstream libraries when a library changed.
We used Bazel to maintain the dependency tree, and then triggered builds based on a custom Github Actions hook that would use `bazel query` to find the transitive closure of affected targets. Then, if anything in a directory was affected, we'd trigger the set of tests defined in a config file in that directory (defaulting to :...), each as its own workflow run that would block PR submission. That worked really well, with the only real limiting factor being the ultimate upper limit of a repo in Github, but of course took a fair amount (a few SWE-months) to build all the tooling.
We’re in the middle of this right now. Go makes this easier: there’s a go CLI command that you can use to list a package’s dependencies, which can be cross-referenced with recent git changes. (duplicating the dependency graph in another build tool is a non-starter for me) But there are corner cases that we’re currently working through.
This, and if you want build + deploy that’s faster than doing it manually from your dev machine, you pay $$$ for either something like Depot, or a beefy VM to host CI.
A bit more work on those dependency corner cases, along with an auto-sleeping VM, should let us achieve nirvana. But it’s not like we have a lot of spare time on our small team.
Go with Bazel gives you a couple options:
* You can use gazelle to auto-generate Bazel rules across many modules - I think the most up to date usage guide is https://github.com/bazel-contrib/rules_go/blob/master/docs/g....
* In addition, you can make your life a lot easier by just making the whole repo a single Go module. Having done the alternate path - trying to keep go.mod and Bazel build files in sync - I would definitely recommend only one module per repo unless you have a very high pain tolerance or actually need to be able to import pieces of the repo with standard Go tooling.
> a beefy VM to host CI
Unless you really need to self-host, Github Actions or GCP Cloud Build can be set up to reference a shared Bazel cache server, which lets builds be quite snappy since it doesn't have to rebuild any leaves that haven't changed.
I've heard horror stories about Bazel, but a lot of them involve either not getting full buy in from the developer team or not investing in building out Bazel correctly. A few months of developer time upfront does seem like a steep ask.
You're pointing out exactly what bothered me with this post in the first place: "we moved from microservices to a monolith and our problems went away"... ... except the problems had not much to do with the service architecture but all to do with operational mistakes and insufficient tooling: bad CI, bad autoscaling, bad oncall.
Both approaches can fail. Especially in environments like Node.js or Python, there's a clear limit to how much code an event loop can handle before performance seriously degrades.
I managed a product where a team of 6–8 people handles 200+ microservices. I've also managed other teams at the same time on another product where 80+ people managed a monolith.
What i learned? Both approaches have pros and cons.
With microservices, it's much easier to push isolated changes with just one or two people. At the same time, global changes become significantly harder.
That's the trade-off, and your mental model needs to align with your business logic. If your software solves a tightly connected business problem, microservices probably aren't the right fit.
On the other hand, if you have a multitude of integrations with different lifecycles but a stable internal protocol, microservices can be a lifesaver.
If someone tries to tell you one approach is universally better, they're being dogmatic/religious rather than rational.
Ultimately, it's not about architecture, it's about how you build abstractions and approach testing and decoupling.
> If your software solves a tightly connected business problem, microservices probably aren't the right fit.
If your software solves a single business problem, it probably belongs in a single (still micro!) service under the theory underlying microservices, in which the "micro" is defined in business terms.
If you are building services at a lower level than that, they aren't microservices (they may be nanoservices.)
How do people usually slice a single business problem?
To me this rationalization has always felt like duct tape over the real problem, which is that the runtime is poorly suited to what people are trying to do.
These problems are effectively solved on beam, the jvm, rust, go, etc.
Can you explain a bit more about what you mean by a limit on how much code an event loop can handle? What's the limit, numerically, and which units does it use? Are you running out of CPU cache?
I assume he means, how much work you let the event loop do without yielding. It doesn't matter if there's 200K lines of code but no real traffic to keep the event loop busy.
Most people don't realize their applications are running like dogwater on Node because serverless is letting them smooth it over by paying 4x what they would be paying if they moved 10 or so lines of code and a few regexes to a web worker.
(and I say that as someone who caught themselves doing the same: severless is really good at hiding this.)
Wait, do people at scale use NodeJS and Python for services? I assume always it’s Go, Java, C# etc.
Depends on your definition of "scale", but yes. I ran an app serving ~1k requests/second from a Django monolith around 2017, distributed across ~20 Heroku "dynos". Nowadays a couple bare-metal servers will handle this.
The only rationale given for the initial switch to microservices is this:
> Initially, when the destinations were divided into separate services, all of the code lived in one repo. A huge point of frustration was that a single broken test caused tests to fail across all destinations.
You kept breaking tests in main so you thought the solution was to revamp your entire codebase structure? Seems a bit backward.
We had a similar problem in our monolith. Team #1 works on a feature, their code breaks tests. Team #2 works on another feature, their code is OK, but they can't move forward because of the failing tests from team #1. Plus often it takes additional time to figure out if it the tests fail because of feature #1 or feature #2, and who must fix them.
We solved it by simply giving every team their own dev environment (before merging to main). So if tests break in feature #1, it doesn't break anything for feature #2 or team #2. It's all confined to their environment. It's just an additional VM + a config in CI/CD. The only downside of this is that if there are conflicts between features they won't be caught immediately (only after one of the teams finally merges to main). But in our case it wasn't a problem because different teams rarely worked on the same parts of the monorepo at the same time due to explicit code ownership.
Thanks. It was a stupid most idea for MOST shops. I think maybe it works for AWS, Google and Netflix but everywhere in my career, I saw 90% of the problem was due to microservices.
Diving system into composable parts is a very very difficult problem already and it is only foolish to introduce further network boundaries between them.
Next comeback I see is away from React and SPAs as view transitions become more common.
can you add [2018] to the title, please?
have they reverted to microservices?
Mono services in a micro repository. /s
No kidding, not cool to be rehashing an article that is 7 years old. In tech terms, that is antiquity.
> Once the code for all destinations lived in a single repo, they could be merged into a single service. With every destination living in one service, our developer productivity substantially improved. We no longer had to deploy 140+ services for a change to one of the shared libraries. One engineer can deploy the service in a matter of minutes.
This is the problem with the undefined nature of the term `microservices`, In my experience if you can't develop in a way that allows you to deploy all services independently and without coordination between services, it may not be a good fit for your orgs needs.
In the parent SOA(v2), what they described is a well known anti-pattern: [0]
If you cannot, due to technical or political reasons, retain the ability to independently deploy a service, no matter if you choose to actually independently deploy, you will not gain most of the advantages that were the original selling point of microservices, which had to do more with organizational scaling than technical conserns.There are other reasons to consider the pattern, especially due to the tooling available, but it is simply not a silver bullet.
And yes, I get that not everyone is going to accept Chris Richardson's definitions[1], but even in more modern versions of this, people always seem to run into the most problems because they try to shove it in a place where the pattern isn't appropriate, or isn't possible.
But kudos to Twilio for doing what every team should be, reassessing if their previous decisions were still valid and moving forward with new choices when they aren't.
[0] https://www.oracle.com/technetwork/topics/entarch/oea-soa-an... [1] https://microservices.io/post/architecture/2022/05/04/micros...
I would caution that microservices should be architected with technical concerns first—-being able to deploy independently is a valid technical concern too.
Doing it for organizational scaling can lead to insular vision with turf defensive attitude, as teams are rewarded on the individual service’s performance and not the complete product’s performance. Also refactoring services now means organizational refactoring, so the friction to refactor is massively increased.
I agree that patterns should be used where most appropriate, instead of blindly.
What pains me is that a language like “Cloud-Native” has been usurped to mean microservices. Did Twilio just stop having a “Cloud-Native” product due to shipping a monolith? According to CNCF, yes. According to reason, no.
In a discussion I was in recently, a participant mentioned "culture eats strategy for breakfast" .. which perhaps makes sense in this context. Be bold enough to do what makes the team and the product thrive.
> With everything running in a monolith, if a bug is introduced in one destination that causes the service to crash, the service will crash for all destinations
We can have a service with 100 features, but only enable the features relevant to a given "purpose". That way, we can still have "micro services" but they're running the same code: "bla.exe -foo" and "bla.exe -bar".
Your “microservice” is just a clumsy & slow symbol lookup over the network, at 1000x the cpu and 10000x the latency.
I have feeling that microservices improve overall design when they can live on their own, as microapps perhaps, also with their own UI. What is the point of service if it is not usable beyond its original design and just bound to other similar services?
> we had to spend time fixing the broken test even if the changes had nothing to do with the initial change. In response to this problem, it was decided to break out the code for each destination into their own repos.
One could also change the way tests are run or selected. Or allow manual overrides to still deploy. Separating repos doesn't sound like the only logical solution
In practice most monoliths turned into "microservices" are just monoliths in disguise. They still have most of the failure modes of the original monolith, but now with all the complexity and considerable challenges of distributed computing layered on top.
Microservices as a goal is mostly touted by people who don't know what the heck they're doing - the kind of people who tend to mistakenly believe blind adherence to one philosophy or the other will help them turn their shoddy work into something passable.
Engineer something that makes sense. If, once you're done, whatever you've built fits the description of "monolith" or "microservices", that's fine.
However if you're just following some cult hoping it works out for your particular use-case, it's time to reevaluate whether you've chosen the right profession.
Microservices were a fad during a period where complexity and solving self-inflicted problems were rewarded more than building an actual sustainable business. It was purely a career- & resume-polishing move for everyone involved.
Putting this anywhere near "engineering" is an insult to even the shoddiest, OceanGate-levels of engineering.
I remember when microservices were introduced and they were solving real problems around 1) independent technological decisions with languages, data stores, and scaling, and 2) separating team development processes. They came out of Amazon, eBay, Google and a host of successful tech titans that were definitely doing "engineering." The Bezos mandate for APIs in 2002 was the beginning of that era.
It was when the "microservices considered harmful" articles started popping up that microservices had become a fad. Most of the HN early-startup energy will continue to do monoliths because of team communication reasons. And I predict that if any of those startups are successful, they will have need for separate services for engineering reasons. If anything, the historical faddishness of HN shows that hackers pick the new and novel because that's who they are, for better or worse.
Wow. Their experience could not be more different than mine. As I’m contemplating the first year of my startup I’ve tallied 6000 deployments and 99.997 percent uptime and a low single digit rollback percentage (MTTR in low single digit minutes and fractional, single cell impact for them so far). While I’m sure it’s possible for a solo entrepreneur to hit numbers like that with a monolith I have never done so, and haven’t see others do so.
Edit: I’d love to eat the humble pie here. If you have examples of places where monoliths are updated 10-20 times a day by a small (or large) team post the link. I’ll read them all.
The idea of deploying to production 10-20 times per day sounds terrifying. What's the rationale for doing so?
I'll assume you're not writing enough bugs that customers are reporting 10-20 new ones per day, but that leaves me confused why you would want to expose customers to that much churn. If we assume an observable issue results in a rollback and you're only rolling back 1-2% of the time (very impressive), once a month or so customers should experience observable issues across multiple subsequent days. That would turn me off making a service integral to my workflow.
If something is difficult or scary, do it more often. Smaller changes are less risky. Code that is merged but not deployed is essentially “inventory” in the factory metaphor. You want to keep inventory low. If the distance between the main branch and production is kept low, then you can always feel pretty confident that the main branch is in a good state, or at least close to one. That’s invaluable when you inevitably need to ship an emergency fix. You can just commit the fix to main instead of trying to find a known good version and patching it. And when a deployment does break something, you’ll have a much smaller diff to search for the problem.
There's a lot of middle ground between "deploy to production 20x a day" and "deploy so infrequently that you forget how to deploy". Like, once a day? I have nothing against emergency fixes, unless you're doing them 9-19x a day. Hotfixes should be uncommon (neither rare nor standard practice).
Org size matters. A team of 500 should be deploying multiple times per day.
Speed is the rationale. I have zero hesitation to deploy and am extremely well practiced at decomposing changes into a series of small safe changes at this point. So maybe it's a single spelling correction, or perhaps it's the backend for a new service integration -- it's all the same to me.
Churn is kind of a loaded word, I'd just call it change. Improvements, efficiencies, additions and yes, of course, fixes.
It may be a little unfair to compare monoliths with distributed services when it comes to deployments. I often deploy three services (sometimes more) to implement a new feature, and that wouldn't be the case with a monolith. So 100% there is a lower number of deploys needed in that world (I know, I've been there). Unfortunately, there is also a natural friction that prevents deploying things as they become available. Google called that latency out in DORA for a reason.
I can't believe how many times I have seen companies trying to implement microservices with multirepo and get lost in the access management, versioning and ending up producing a fragile, overly complex hot mess.
Discussion in 2018, when this blog post was published: https://news.ycombinator.com/item?id=17499137
Monolith is definitely what you want to start with.
Being able to ~instantly obtain a perfect list of all references to all symbols is an extraordinarily powerful capability. The stronger the type system, the more leverage you get. If you have only ever had experience with weak type systems or poor tooling, I could understand how the notion of putting everything into one compilation context seems pointless.
/me raises hand. Any system that passes querysets around for one. Can't know who is using what.
They obviously read https://grugbrain.dev/#grug-on-microservices
A recent blog post from Docker mentions about Twilio and Amazon Prime Video seeing gains by moving away from microservices to monolith
You Want Microservices, But Do You Really Need Them? https://www.docker.com/blog/do-you-really-need-microservices...
I humbly post this little widget to help your team decide if some functionality warrants being a separate service or not: https://mulch.dev/service-scorecard/
I don't think this blog post reflects so well on this engineering team. Kudos to them to be so transparent about it though. "We had so many flaky tests that depended on 3rd parties that broke pipelines that we decided on micro-services" is not something I would put on my CV at least.
That seems unfair. There's a lot we don't know about the politics behind the scenes. I'd bet that the individuals who created the microservice architecture aren't the same people who re-consolidated them into one service. If true, the authors of the article are being generous to the original creators of the microservices, which I think reflects well on them for not badmouthing their predecessors.
Too much of anything sucks. Too big of a monolith? Sucks. Too many microservices? Sucks. Getting the right balance is HARD.
Plus, it's ALWAYS easier/better to run v2 of something when you completely re-write v1 from scratch. The article could have just as easily been "Why Segment moved from 100 microservices to 5" or "Why Segment rewrote every microservice". The benefits of hindsight and real-world data shouldn't be undersold.
At the end of the day, write something, get it out there. Make decisions, accept some of them will be wrong. Be willing to correct for those mistakes or at least accept they will be a pain for a while.
In short: No matter what you do the first time around... it's wrong.
This is a horror story of being totally unable to understand your product and its behavior and throwing people and resources in large rewrites to only learn that you still don't understand your product and its behavior. Badly done tests used as a justification to write multiple suites of badly done tests and it is all blamed on the architecture.
Some important context to this 2018 article is given here: https://www.twilio.com/en-us/blog/archive/2018/introducing-c...
TL;DR they have a highly partitioned job database, where a job is a delivery of a specific event to a specific destination, and each partition is acted upon by at-most-one worker at a time, so lock contention is only at the infrastructure level.
In that context, each worker can handle a similar balanced workload between destinations, with a fraction of production traffic, so a monorepo makes all the sense in the world.
IMO it speaks to the way in which microservices can be a way to enforce good boundaries between teams... but the drawbacks are significant, and a cross-team review process for API changes and extensions can be equally effective and enable simplified architectures that sidestep many distributed-system problems at scale.
They also failed as a company, which is why that's on Twilio's blog now. So there's that. Undoubtedly their microservices architecture was a bad fit because of how technically focused the product was. But their solution with a monolith didn't have the desired effect either.
Failed? It was a $3.2B acquisition with a total of 283M raised. I don’t see any way that’s a failure.
That said I’m curious if you’re basing this on service degradation you’ve seen since the acquisition. We were thinking of starting to use them - is that a bad move?
By all means use Segment. Segment was a great technology with an incredible technical vision for what they wanted to do. I was in conversations in that office on Market far beyond what they ended up doing post-acquisition.
But a company that can't stand on its own isn't a success in my opinion. Similar things can be said about companies that continue to need round after round of funding without an IPO.
My comment is of the "(2018)" variety. Old news that didn't age well like the people jumping on the "Uber: why we switched to MySQL from Postgres" post. (How many people would choose that decision today?)
People tend to divorce the actual results of a lot of these companies from the gripes of the developers of the tech blogs.
You can have infrastructure complexity (microservices) or trade it for development complexity (monolith).
Choose one.
I don't care how it is done just dont rely on your database schema for data modeling and business logic
This is not the first time that an engineer working at a big company thinks they are using a monolith when in reality they are a small team in charge of a single microservice, which in turn is part of a company that definitely does not run a monolith.
Last time it was an aws engineer that worked on route 53, and they dismissed microservices in a startup claiming that in AWS they ran a monolith (as in the r53 dns).
Everything is a monolith if you zoom in enough and ignore everything else. Which I guess you can do when you work on a big company and are in charge of a very specific role.
"Microservices is the software industry’s most successful confidence scam. It convinces small teams that they are “thinking big” while systematically destroying their ability to move at all. It flatters ambition by weaponizing insecurity: if you’re not running a constellation of services, are you even a real company? Never mind that this architecture was invented to cope with organizational dysfunction at planetary scale. Now it’s being prescribed to teams that still share a Slack channel and a lunch table.
Small teams run on shared context. That is their superpower. Everyone can reason end-to-end. Everyone can change anything. Microservices vaporize that advantage on contact. They replace shared understanding with distributed ignorance. No one owns the whole anymore. Everyone owns a shard. The system becomes something that merely happens to the team, rather than something the team actively understands. This isn’t sophistication. It’s abdication.
Then comes the operational farce. Each service demands its own pipeline, secrets, alerts, metrics, dashboards, permissions, backups, and rituals of appeasement. You don’t “deploy” anymore—you synchronize a fleet. One bug now requires a multi-service autopsy. A feature release becomes a coordination exercise across artificial borders you invented for no reason. You didn’t simplify your system. You shattered it and called the debris “architecture.”
Microservices also lock incompetence in amber. You are forced to define APIs before you understand your own business. Guesses become contracts. Bad ideas become permanent dependencies. Every early mistake metastasizes through the network. In a monolith, wrong thinking is corrected with a refactor. In microservices, wrong thinking becomes infrastructure. You don’t just regret it—you host it, version it, and monitor it.
The claim that monoliths don’t scale is one of the dumbest lies in modern engineering folklore. What doesn’t scale is chaos. What doesn’t scale is process cosplay. What doesn’t scale is pretending you’re Netflix while shipping a glorified CRUD app. Monoliths scale just fine when teams have discipline, tests, and restraint. But restraint isn’t fashionable, and boring doesn’t make conference talks.
Microservices for small teams is not a technical mistake—it is a philosophical failure. It announces, loudly, that the team does not trust itself to understand its own system. It replaces accountability with protocol and momentum with middleware. You don’t get “future proofing.” You get permanent drag. And by the time you finally earn the scale that might justify this circus, your speed, your clarity, and your product instincts will already be gone."
-DHH
Also from DHH: microservices were a zero-interest rate phenomena https://youtu.be/iqXjGiQ_D-A?t=924
Is it 2018? Are you guys going to repost the MySQL DB as a queue story again? Perhaps an announcement that you’re migrating to Java 9 and what you learned about generics?
I left Twilio in 2018. I spent a decade at SendGrid. I spent a small time in Segment.
The shitty arch is not a point against (micro)services. SendGrid, another Twilio property, uses (micro)services to great effect. Services there were fully independently deployable.
The whole point of micro-services is to manage dependencies independently across service boundaries, using the API as the contract, not the internal libraries.
Then you can implement a service in Java, Python, Rust, C++, etc, and it doesn't matter.
Coupling your postgres db to your elasticsearch cluster via a hard library dependency impossibly heavy. The same insight applies to your bespoke services.
They have a monolith but struggle with individual subsystem failures bringing down the whole thing. Sounds like they would benefit from Elixir’s isolated, fail-fast architecture.
Great writeup. Much of this is more about testing, how package dependencies are expressed and many-repo/singlerepo tradeoffs than "microservices"!
Maintaining and testing a codebase containing many external integrations ("Destinations") was one of the drivers behind the earlier decision to shatter into many repos, to isolate the impact of Destination-specific test suite failures caused because some tests were actually testing integration to external 3rd party services.
One way to think about that situation is in terms of packages, their dependency structure, how those dependencies are expressed (e.g. decoupled via versioned artefact releases, directly coupled via monorepo style source checkout), their rates of change, and the quality of their automated tests suites (high quality meaning the test suite runs really fast, tests only the thing it is meant to test, has low rates of false negatives and false positives, low quality meaning the opposite).
Their initial situation was one that rapidly becomes unworkable: a shared library package undergoing a high rate of change depended on by many Destination packages, each with low quality test suites, where the dependencies were expressed in a directly-coupled way by virtue of everything existing in a single repo.
There's a general principle here: multiple packages in a single repo with directly-coupled dependencies, where those packages have test suites with wildly varying levels of quality, quickly becomes a nightmare to maintain. The packages with low quality test suites that depend upon high quality rapidly changing shared packages generate spurious test failures that need to be triaged and slow down development. Maintainers of packages that depend upon rapidly changing shared package but do not have high quality test suites able to detect regressions may find their package frequently gets broken without anyone realising in time.
Their initial move solves this problem by shattering the single repo and trade directly-coupled dependencies with decoupled versioned dependencies, to decouple the rate of change of the shared package from the per Destination packages. That was an incremental improvement but added the complexity and overhead of maintaining multiple versions of the "shared" library and per-repo boilerplate, which grows over time as more Destinations are added or more changes are made to the shared library while deferring the work to upgrade and retest Destinations to use it.
Their later move was to reverse this, go back to directly-coupled dependencies, but instead improve the quality of their per-Destination test suites, particularly by introducing record/replay style testing of Destinations. Great move. This means that the test suite of each Destination is measuring "is the Destination package adhering to its contract in how it should integrate with the 3rd party API & integrate with the shared package?" without being conflated with testing stuff that's outside of the control of code in the repo (is the 3rd party service even up, etc).
(2018)
Some this sounds like the journey to ejb's and back.
> Microservices is a service-oriented software architecture in which server-side applications are constructed by combining many single-purpose, low-footprint network services.
Gonna stop you right there.
Microservices have nothing to do with the hosting or operating architecture.
Martin Fowler who formalized the term, Microservices are:
“In short, the microservice architectural style is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API. These services are built around business capabilities and independently deployable by fully automated deployment machinery”
You can have an entirely local application built on the “microservice architectural style.”
Saying they are “often HTTP and API” is besides the point.
The problem Twilio actually describe is that they messed up service granularity and distributed systems engineering processes
Twilio's experience was not a failure of the microservice architectural style. This was a failure to correctly define service boundaries based on business capabilities.
Their struggles with serialization, network hops, and complex queueing were symptoms of building a distributed monolith, which they finally made explicit with this move. So they accidentally built a system with the overhead of distribution but the tight coupling of a single application. Now they are making their foundations of architecture fit what they built, likely cause they poorly planned it.
The true lesson is that correctly applying microservices requires insanely hard domain modeling and iteration and meticulous attention to the "Distributed Systems Premium."
https://martinfowler.com/microservices/
Please don’t fall into the Fowler-said-so trap.
Just because he says something does not mean Fowler “formalized the term”. Martin wrote about every topic under the sun, and he loved renaming and or redefining things to fit his world view, and incidentally drive people not just to his blog but also to his consultancy, Thoughtworks.
PS The “single application” line shows how dated Fowlers view were then and certainly are today.
I've been developing under that understanding since before Fowler-said-so. His take is simply a description of a phenomenon predating the moniker of microservices. SOA with things like CORBA, WSDL, UDDI, Java services in app servers etc. was a take on service oriented architectures that had many problems.
Anyone who has ever developed in a Java codebase with "Service" and then "ServiceImpl"s everywhere can see the lineage of that model. Services were supposed to be the API, and the implementation provided in a separate process container. Microservices signalled a time where SOA without Java as a pre-requisite had been successful in large tech companies. They had reached the point of needing even more granular breakout and a reduction of reliance on Java. HTTP interfaces was an enabler of that. 2010s era microservices people never understood the basics, and many don't even know what they're criticizing.
Thank you this is the point
I feel like microservices have gotten a lot easier over the last 7 years from when Twilio experienced this, not just from my experience but from refinements in architectures
There are infinite permutations in architecture and we've collectively narrowed them down to things that are cheap to deploy, automatically scale for low costs, and easily replicable with a simple script
We should be talking about how AI knows those scripts too and can synthetize adjustments, dedicated Site Reliability Engineers and DevOps is great for maintaining convoluted legacy setups, but irrelevant for doing the same thing from scratch nowadays
You know what I think is better than a push of the CPU stack pointer and a jump to a library?
A network call. Because nothing could be better for your code than putting the INTERNET into the middle of your application.
--
The "micro" of microservices has always been ridiculous.
If it can run on one machine then do it. Otherwise you have to deal with networking. Only do networking when you have to. Not as a hobby, unless your program really is a hobby.
Microservices have nothing to do with the underlying hosting architecture. Microservices can all run and communicate on a single machine. There will be a local network involved, but it absolutely does require the internet or multiple machines.
it's not really "micro" but more so "discreet" as in special purpose, one off. to ensure consistent performance, as opposed to shared performance.
yes, networking is the bottleneck between the processes, while one machine is the bottleneck to end users
> one machine is the bottleneck to end users
You can run your monolith on multiple machines and round-robin end-user requests between them. Your state is in the DB anyway.
I do bare metal sometimes and I like the advances in virtualization for many processes there too
Not everything you think you know is right.
https://github.com/sirupsen/napkin-math
Well implemented network hardware can have high bandwidth and low latency. But that doesn't get around the complexity and headaches it brings. Even with the best fiber optics, wires can be cut or tripped over. Controllers can fail. Drivers can be buggy. Networks can be misconfigured. And so on. Any request - even sent over a local network - can and will fail on you eventually. And you can't really make a microservice system keep working properly when links start failing.
Local function calls are infinitely more reliable. The main operational downside with a binary monolith is that a bug in one part of the program will crash the whole thing. Honestly, I still think Erlang got it right here with supervisor trees. Use "microservices". But let them all live on the same computer, in the same process. And add tooling to the runtime environment to allow individual "services" to fail or get replaced without taking down the rest of the system.
Do you have any recommended reading on the topic of refinements in architectures? Thank you.
These "we moved from X to Y" posts are like Dunning-Kruger humblebrags. Yes, we all lack information and make mistakes. But there's never an explanation in these posts of how they've determined their new decision is any less erroneous than their old decision. It's like they threw darts at a wall and said "cool, that's our new system design (and SDLC)". If you have not built it yourself before, and have not studied in depth an identical system, just assume you are doing the wrong thing. Otherwise you are running towards another Dunning-Kruger pit.
If you have a company that writes software, please ask a professional software/systems architect to review your plans before you build. The initial decisions here would be a huge red flag to any experienced architect, and the subsequent decisions are full of hidden traps, and are setting them up for more failure. If you don't already have a very skilled architect on staff (99% chance you don't) you need to find one and consult with them. Otherwise your business will suffer from being trapped in unnecessary time-consuming expensive rework, or worse, the whole thing collapsing.
The “distributed monolith” line is the key takeaway here.
Microservices only buy you something if teams can deploy, version, and reason about them independently. Once shared libraries or coordinated deploys creep in, you’ve taken on all the operational cost with none of the autonomy benefits.
I’ve seen monoliths with clear module boundaries outperform microservice setups by an order of magnitude in developer throughput.
If microservice or monolith is giving order of magnitude improvement in productivity, you clearly are doing something wrong, or having terrible practices.