Monoliths To Microservices: Lessons From Netflix, Snap, Uber, and Amazon
Avoid reinventing the wheel with open-source software like Thrift, Envoy, and Kubernetes but ensure it's worth making the switch first.
Join the DZone community and get the full member experience.
Join For FreeWe often break out legacy monolithic architectures into different services for various reasons including flexibility, scalability, and development velocity. We may easily see the benefits of this switch, but it’s tough to fully appreciate the challenges. Let’s zoom into a few companies known for their technical prowess and see how they navigated these waters.
Snap’s Leap From Monolith to Microservices
Snap successfully scaled its business by having the product running on the Google app engine till 2018, when issues with tight coupling, architecture inflexibility, cloud vendor lock-in, and cost pushed them to reconsider. They achieved a fast migration by not reinventing the wheel, and providing an abstracted experience for developers to build on. Kubernetes made resource management easy. Envoy allowed communication over gRPC/HTTP2, circuit breaking, easy monitoring, and centralized configuration. Spinnaker made deployment pipelines with canaries, zonal rollouts, and health checking a default.
All this open source was a developer’s dream, but an SRE’s nightmare. So Snap also built a web app as an abstraction layer on top of the service mesh which allowed just enough configuration to make everyone happy. The “Switchboard” web app allowed basic configuration for retry logic, parallel requests, etc. but left things like the Envoy’s config API locked up. It’s a real happy ending for a tough multi-year migration.
Netflix's Video Processing Pipeline Transformation
At Netflix, the Video Processing Pipeline is used to orchestrate workloads related to media encoding. For example, content is handed to Netflix by third parties via high-quality masters known as mezzanines which need to be re-encoded to a format suitable for streaming. Originally the pipeline was written in one piece, but there was unintended coupling, long release cycles, slow innovation, and lots of unhappy devs which led to a decision to re-architect.
However, instead of breaking out the pipeline into many of services talking to each other via APIs, they built orchestrators that would call services as needed. One interesting benefit of this architecture was that they could write different orchestrators for different use cases: one for member streaming which optimized for quality across millions of streams, and one for studio operations which optimized for fast turnaround. The orchestrators could pick and choose services, and call them in any order, with any configuration. So when the time came to launch their new Ad-supported tier, it was significantly easier to modify the pipeline to support that use case.
Uber's Thrift-y Approach to Service Contracts
When you create a microservice, it’s tough for your users to know what the request and response schema should be. Or even if it should be JSON or RPC or something else. It doesn’t end there: users of your service would want guarantees of latency and availability, they wouldn’t want their contract with your service changing every time you do a new deployment, etc. This is the issue that Uber faced, and luckily there was one framework that solved most of it: Thrift.
Thrift allowed services to be written in any language as long as their interfaces were defined in Thrift's Interface Definition Language (IDL) which allowed for efficient, safe, and reliable interoperability. It also allowed a standard means of updating the service’s request or response schema and broadcasting the change to users. So each user of an Uber service knew what the request and response schemas were, and could tell if a breaking change was pushed.
Amazon’s Bold Switch to Monolith Architecture
When it’s just one program, the different parts can use memory to communicate. When you switch to a Service Oriented Architecture, you have to use network traffic that can introduce challenges with observability, cost, and cascading failures. Counter-intuitively, this can happen while the monolith’s original problems remain.
Historically at Amazon Prime Video, thousands of streams were monitored for quality by a distributed pipeline consisting of many services. One service would ingest a stream and encode it into frames, a couple would run different quality checks on the frames, and an orchestrator would manage the work. Each frame that was encoded from the live stream was pushed to an object store and then accessed many times by each monitoring service, which quickly became too expensive. Unexpectedly, scaling was also an issue because there was only one orchestrator, and it had to manage each frame across the entire monitoring lifecycle which created a bottleneck. So the team at Amazon boldly combined everything into one service and ran it on containers reducing infrastructure costs by an astonishing 90%.
Conclusion
A lot of projects start with a single codebase and service because it’s fast, easy, and doesn’t really cause problems until a company or project grows big. Until then, it allows standards to be shared across different components, quicker ramp-up since there’s only one way of doing things, and simpler monitoring, testing, release, and rollback. Everyone’s happy until they aren’t. When they notice different components are tightly coupled changing the color of a button makes a different button disappear. Maybe there was one too many outages caused by a panic in an unimportant part of the application. Or maybe people just want to do a deployment without having to wait to freeze and release the whole product’s codebase once every 2-4 weeks. That’s when it’s worth considering microservices that break applications down into bite-sized pieces that each team can own and build according to requirements.
Opinions expressed by DZone contributors are their own.
Comments