On Monoliths to Microservices and Technology choices

tl;dr; monoliths suck and microservices are awesome

but also

tl;dr; microservices suck and monoliths are awesome

Confused? So are most people, so much so when we started out cloud journey we held a public debate for a bit of fun on if we had decomposed too far (we had around 30 microservices at that time in Identity). The reality is both approaches are great options, both have drawbacks but also IMO there are pitfalls that teams fall into. So lets talk about one of them – technology choices.

The monolith is ideal

There is nothing old / crufty / broken with a monolith. You have one artefact to code / build / deploy / reason about. The opportunities for failures due to operational concerns (partitions / network latency etc.) are much more limited. Monoliths are simple , keeping things as simple as they can be and no simpler is what good engineering is about. Microservices are a tradeoff from a monolith in response to pressures of scaling. We trade off some of these properties so we can scale:

the number of engineers you want to be able contribute – you’ll be adding more people and a single code base becomes unwieldy over time
deployments – you want to maintain a fast pace of small change as you add engineers. That becomes increasingly difficult as you have more contributors to a single codebase
non-functional needs – some parts of the system may deal with orders of magnitude more load / faster response time and potentially the need to pre-scale the system to deal with that load. Some parts may be more critical than others and hence be subject to different expectations of reliability.

Now a good starting point is the best system is the simplest one that meets your needs and is the most amenable to future change. Monoliths in the early stages tick these boxes and microservices may help manage complexity but also introduce the opportunity to create it. There are a lot of simple truths with microservices that are superficially true but ultimately don’t play out to your benefit.

Teams own services, teams should have free reign to pick and choose the tools they use to build them. True… but not as most folks interpret it.

At that time we had few disparate applications (I’d not call them services, they were software run on virtual machines that we’d automated the provisioning of for each customer). Those applications were (for the most part) written on Java using common approaches. However as we embraced decomposition of those systems, building new ones and moving toward the autonomy of You Build It You Run It, teams began to experiment with other approaches. Some teams embraced functional programming in a big way on Scala. Some went the Node route for non-blocking benefits. Having microservices meant each team didn’t need to choose the same language or even the same paradigm. For the engineers who chose their favourite languages and paradigms it was all skittles and beer for a while. But…

Teams are not static. We started down this microservice path because teams need to scale and we wanted to move faster. Now we’ve got a bunch of diverse technical options that come with their own tool chains, frameworks and ultimately ramp time to move across them. Hiring people who are FP Scala experts is challenging. Forget hiring, what about malleability in your org and the ability to move the people onto the most important work. Not every senior Java engineer looking to join to have impact and deliver value for customers wants to spend the next several months learning a paradigm.

Features don’t respect service boundaries. I’m a big fan of domain driven design. I am also a big fan of incremental delivery and it isn’t pragmatic to build out every service that owns a noun or verb in isolation. They grow over time in response to feature requests to meet the needs of customers. Hence, engineers need to collaborate across service boundaries and now you’ve inherently created a bottleneck on the team who are skilled in a service to be the rate limit on change to it. You can argue there is more to ramp on moving between services than language, frameworks, tools (like the domain itself). This is true, but the ramp isn’t linear and the harder you make it to even make small contributions, the less teams look outside, the less they know the harder contribution becomes.

Services move between teams. As your organisation develops it will make sense to reshuffle the mission and hence the services a team operates. You can end up with teams that have many different services with many different stacks and a fragmented ability to support them (if at all). You end up with siloing exacerbating the challenges to move work forward, moreover having a sustainable on-call list becomes problematic. Ultimately you end up needing to urgently do service rewrites to a stack that can be supported.

I read somewhere a microservice is a service you can rewrite in a couple of weeks. I’ve not known many services that are so simple, but if you are running large scale SaaS you will not be switching off one service and turning on another. You will be rebuilding your testing suite. You will need to ensure your testing suite is accurate, you will want to compare responses between the new system and old (i.e. shadow traffic), you will want to verify non-functional aspects for better than or equivalent (ideally you want no change – I’ve seen incidents caused by unexpected performance improvements removing a bottleneck that was effectively a rate limit for parts of the system that otherwise break when you speed them up). The reality is I’ve yet to see a rewrite take less than a quarter for a team to do successfully (and often longer). Long story short, there is no cheap rewrite when you are dealing with systems of non-trivial complexity (that often comes from non-functional requirements; specified or otherwise).

As your organisation grows you’ll want to gain efficiencies in maintenance. Having common approaches to keeping restricted information out of logs, avoiding n-patches of the same common vulnerability in every service you have. Having fewer (ideally 1) options to support means you can lean into these benefits, you will have more folks to contribute to the common library team. As you have incidents (and you will) you will look for ways to enshrine the lessons from them into base images / libraries and anywhere else you can to minimise organisation wide education with 100% action and retention as a strategy to prevent incidents going forward.

In short…

Teams should choose their tooling options, but it should be from a (very) constrained set. Moreover, the difference in choices should be grounded in needs of the customer (performance, scale). For example, we have and continue to support different data stores that are optimised for different types of queries that you otherwise couldn’t deliver at scale. However I don’t buy into the argument of stronger typed languages have reliability benefits that justify divergence from tools like Node. Or that non-blocking in Node justifies divergence from JVM based approaches (Kotlin + Coroutines FTW). I’ve simply not seen language/framework choices at that level have material impact to outcomes for customers, however I have seen divergence create artificial barriers to collaboration and constraints on execution that have had material impact on delivering value for customers. Rectifying those decisions later on is expensive and disruptive.

So if you are thinking of decomposing and going toward the panacea of any-team-does-anything in microservices, think carefully on what benefits you are leaving behind from the monolith and which ones you can keep (like a consistent developer experience and low ramp in moving across your services).

On Monoliths to Microservices and Technology choices

Published by William Bath

One thought on “On Monoliths to Microservices and Technology choices”

Leave a comment Cancel reply

Share this:

Related

Published by William Bath

One thought on “On Monoliths to Microservices and Technology choices”

Leave a comment Cancel reply