A microservices architecture is a development method for designing applications as modular services that seamlessly adapt to a highly scalable and dynamic environment. Microservices help solve complex issues such as speed and scalability, while also supporting continuous testing and delivery. This Zone will take you through breaking down the monolith step by step and designing a microservices architecture from scratch. Stay up to date on the industry's changes with topics such as container deployment, architectural design patterns, event-driven architecture, service meshes, and more.
You Can Shape Trend Reports — Participate in DZone Research Surveys + Enter the Raffles!
Spring Microservice Application Resilience: The Role of @Transactional in Preventing Connection Leaks
The monolithic architecture was historically used by developers for a long time — and for a long time, it worked. Unfortunately, these architectures use fewer parts that are larger, thus meaning they were more likely to fail in entirety if a single part failed. Often, these applications ran as a singular process, which only exacerbated the issue. Microservices solve these specific issues by having each microservice run as a separate process. If one cog goes down, it doesn’t necessarily mean the whole machine stops running. Plus, diagnosing and fixing defects in smaller, highly cohesive services is often easier than in larger monolithic ones. Microservices design patterns provide tried-and-true fundamental building blocks that can help write code for microservices. By utilizing patterns during the development process, you save time and ensure a higher level of accuracy versus writing code for your microservices app from scratch. In this article, we cover a comprehensive overview of microservices design patterns you need to know, as well as when to apply them. Key Benefits of Using Microservices Design Patterns Microservices design patterns offer several key benefits, including: Scalability: Microservices allow applications to be broken down into smaller, independent services, each responsible for a specific function or feature. This modular architecture enables individual services to be scaled independently based on demand, improving overall system scalability and resource utilization. Flexibility and agility: Microservices promote flexibility and agility by decoupling different parts of the application. Each service can be developed, deployed, and updated independently, allowing teams to work autonomously and release new features more frequently. This flexibility enables faster time-to-market and easier adaptation to changing business requirements. Resilience and fault isolation: Microservices improve system resilience and fault isolation by isolating failures to specific services. If one service experiences an issue or failure, it does not necessarily impact the entire application. This isolation minimizes downtime and improves system reliability, ensuring that the application remains available and responsive. Technology diversity: Microservices enable technology diversity by allowing each service to be built using the most suitable technology stack for its specific requirements. This flexibility enables teams to choose the right tools and technologies for each service, optimizing performance, development speed, and maintenance. Improved development and deployment processes: Microservices streamline development and deployment processes by breaking down complex applications into smaller, manageable components. This modular architecture simplifies testing, debugging, and maintenance tasks, making it easier for development teams to collaborate and iterate on software updates. Scalability and cost efficiency: Microservices enable organizations to scale their applications more efficiently by allocating resources only to the services that require them. This granular approach to resource allocation helps optimize costs and ensures that resources are used effectively, especially in cloud environments where resources are billed based on usage. Enhanced fault tolerance: Microservices architecture allows for better fault tolerance as services can be designed to gracefully degrade or fail independently without impacting the overall system. This ensures that critical functionalities remain available even in the event of failures or disruptions. Easier maintenance and updates: Microservices simplify maintenance and updates by allowing changes to be made to individual services without affecting the entire application. This reduces the risk of unintended side effects and makes it easier to roll back changes if necessary, improving overall system stability and reliability. Let's go ahead and look for different Microservices Design Patterns. Database per Service Pattern The database is one of the most important components of microservices architecture, but it isn’t uncommon for developers to overlook the database per service pattern when building their services. Database organization will affect the efficiency and complexity of the application. The most common options that a developer can use when determining the organizational architecture of an application are: Dedicated Database for Each Service A database dedicated to one service can’t be accessed by other services. This is one of the reasons that makes it much easier to scale and understand from a whole end-to-end business aspect. Picture a scenario where your databases have different needs or access requirements. The data owned by one service may be largely relational, while a second service might be better served by a NoSQL solution and a third service may require a vector database. In this scenario, using dedicated services for each database could help you manage them more easily. This structure also reduces coupling as one service can’t tie itself to the tables of another. Services are forced to communicate via published interfaces. The downside is that dedicated databases require a failure protection mechanism for events where communication fails. Single Database Shared by All Services A single shared database isn’t the standard for microservices architecture but bears mentioning as an alternative nonetheless. Here, the issue is that microservices using a single shared database lose many of the key benefits developers rely on, including scalability, robustness, and independence. Still, sharing a physical database may be appropriate in some situations. When a single database is shared by all services, though, it’s very important to enforce logical boundaries within it. For example, each service should own its have schema, and read/write access should be restricted to ensure that services can’t poke around where they don’t belong. Saga Pattern A saga is a series of local transactions. In microservices applications, a saga pattern can help maintain data consistency during distributed transactions. The saga pattern is an alternative solution to other design patterns that allow for multiple transactions by giving rollback opportunities. A common scenario is an e-commerce application that allows customers to purchase products using credit. Data may be stored in two different databases: One for orders and one for customers. The purchase amount can’t exceed the credit limit. To implement the Saga pattern, developers can choose between two common approaches. 1. Choreography Using the choreography approach, a service will perform a transaction and then publish an event. In some instances, other services will respond to those published events and perform tasks according to their coded instructions. These secondary tasks may or may not also publish events, according to presets. In the example above, you could use a choreography approach so that each local e-commerce transaction publishes an event that triggers a local transaction in the credit service. Benefits of Choreography After having explained the term itself let us take a closer look at the benefits of using a choreographed pattern for a microservice architecture. The most important ones are outlined in the bulleted list below: Loose coupling: Choreography allows microservices to be loosely coupled, which means they can operate independently and asynchronously without depending on a central coordinator. This can make the system more scalable and resilient, as the failure of one microservice will not necessarily affect the other microservices. Ease of maintenance: Choreography allows microservices to be developed and maintained independently, which can make it easier to update and evolve the system. Decentralized control: Choreography allows control to be decentralized, which can make the system more resilient and less prone to failure. Asynchronous communication: Choreography allows microservices to communicate asynchronously, which can be more efficient and scalable than synchronous communication. Overall, choreography can be a useful design pattern for building scalable, resilient, and maintainable microservice architectures. Though some of these benefits can actually turn into drawbacks. 2. Orchestration An orchestration approach will perform transactions and publish events using an object to orchestrate the events, triggering other services to respond by completing their tasks. The orchestrator tells the participants what local transactions to execute. Saga is a complex design pattern that requires a high level of skill to successfully implement. However, the benefit of proper implementation is maintained data consistency across multiple services without tight coupling. Benefits of Orchestration Orchestration in microservice architectures can lead to some nice benefits which compensate for the drawbacks of a choreographed system. A few of them are explained below: Simplicity: Orchestration can be simpler to implement and maintain than choreography, as it relies on a central coordinator to manage and coordinate the interactions between the microservices. Centralized control: With a central coordinator, it is easier to monitor and manage the interactions between the microservices in an orchestrated system. Visibility: Orchestration allows for a holistic view of the system, as the central coordinator has visibility into all of the interactions between the microservices. Ease of troubleshooting: With a central coordinator, it is easier to troubleshoot issues in an orchestrated system. When to use Orchestration vs Choreography Whether you want to use choreography or orchestration in your microservice architecture should always be a well-thought-out choice. Both approaches bring their advantages but also downsides. API Gateway Pattern For large applications with multiple clients, implementing an API gateway pattern is a compelling option One of the largest benefits is that it insulates the client from needing to know how services have been partitioned. However, different teams will value the API gateway pattern for different reasons. One of these possible reasons is that it grants a single entry point for a group of microservices by working as a reverse proxy between client apps and the services. Another is that clients don’t need to know how services are partitioned, and service boundaries can evolve independently since the client knows nothing about them. The client also doesn’t need to know how to find or communicate with a multitude of ever-changing services. You can also create a gateway for specific types of clients (for example, backends for frontends) which improves ergonomics and reduces the number of roundtrips needed to fetch data. Plus, an API gateway pattern can take care of crucial tasks like authentication, SSL termination, and caching, which makes your app more secure and user-friendly. Another advantage is that the pattern insulates the client from needing to know how services have been partitioned. Before moving on to the next pattern, there’s one more benefit to cover: Security. The primary way the pattern improves security is by reducing the attack surface area. By providing a single entry point, the API endpoints aren’t directly exposed to clients, and authorization and SSL can be efficiently implemented. Developers can use this design pattern to decouple internal microservices from client apps so a partially failed request can be utilized. This ensures a whole request won’t fail because a single microservice is unresponsive. To do this, the encoded API gateway utilizes the cache to provide an empty response or return a valid error code. Circuit Breaker Design Pattern This pattern is usually applied between services that are communicating synchronously. A developer might decide to utilize the circuit breaker when a service is exhibiting high latency or is completely unresponsive. The utility here is that failure across multiple systems is prevented when a single microservice is unresponsive. Therefore, calls won’t be piling up and using the system resources, which could cause significant delays within the app or even a string of service failures. Implementing this pattern as a function in a circuit breaker design requires an object to be called to monitor failure conditions. When a failure condition is detected, the circuit breaker will trip. Once this has been tripped, all calls to the circuit breaker will result in an error and be directed to a different service. Alternatively, calls can result in a default error message being retrieved. There are three states of the circuit breaker pattern functions that developers should be aware of. These are: Open: A circuit breaker pattern is open when the number of failures has exceeded the threshold. When in this state, the microservice gives errors for the calls without executing the desired function. Closed: When a circuit breaker is closed, it’s in the default state and all calls are responded to normally. This is the ideal state developers want a circuit breaker microservice to remain in — in a perfect world, of course. Half-open: When a circuit breaker is checking for underlying problems, it remains in a half-open state. Some calls may be responded to normally, but some may not be. It depends on why the circuit breaker switched to this state initially. Command Query Responsibility Segregation (CQRS) A developer might use a command query responsibility segregation (CQRS) design pattern if they want a solution to traditional database issues like data contention risk. CQRS can also be used for situations when app performance and security are complex and objects are exposed to both reading and writing transactions. The way this works is that CQRS is responsible for either changing the state of the entity or returning the result in a transaction. Multiple views can be provided for query purposes, and the read side of the system can be optimized separately from the write side. This shift allows for a reduction in the complexity of all apps by separately querying models and commands so: The write side of the model handles persistence events and acts as a data source for the read side The read side of the model generates projections of the data, which are highly denormalized views Asynchronous Messaging If a service doesn’t need to wait for a response and can continue running its code post-failure, asynchronous messaging can be used. Using this design pattern, microservices can communicate in a way that’s fast and responsive. Sometimes this pattern is referred to as event-driven communication. To achieve the fastest, most responsive app, developers can use a message queue to maximize efficiency while minimizing response delays. This pattern can help connect multiple microservices without creating dependencies or tightly coupling them. While there are tradeoffs one makes with async communication (such as eventual consistency), it’s still a flexible, scalable approach to designing a microservices architecture. Event Sourcing The event-sourcing design pattern is used in microservices when a developer wants to capture all changes in an entity’s state. Using event stores like Kafka or alternatives will help keep track of event changes and can even function as a message broker. A message broker helps with the communication between different microservices, monitoring messages and ensuring communication is reliable and stable. To facilitate this function, the event sourcing pattern stores a series of state-changing events and can reconstruct the current state by replaying the occurrences of an entity. Using event sourcing is a viable option in microservices when transactions are critical to the application. This also works well when changes to the existing data layer codebase need to be avoided. Strangler-Fig Pattern Developers mostly use the strangler design pattern to incrementally transform a monolith application to microservices. This is accomplished by replacing old functionality with a new service — and, consequently, this is how the pattern receives its name. Once the new service is ready to be executed, the old service is “strangled” so the new one can take over. To accomplish this successful transfer from monolith to microservices, a facade interface is used by developers that allows them to expose individual services and functions. The targeted functions are broken free from the monolith so they can be “strangled” and replaced. Utilizing Design Patterns To Make Organization More Manageable Setting up the proper architecture and process tooling will help you create a successful microservice workflow. Use the design patterns described above and learn more about microservices in my blog to create a robust, functional app.
Origin of Cell-Based Architecture In the rapidly evolving domain of digital services, the need for scalable and resilient architectures (the ability of the system to recover from a failure quickly) has peaked. The introduction of cell-based architecture marks a pivotal shift tailored to meet the surging demands of hyper-scaling (architecture's ability for rapid scaling in response to fluctuating demand). This methodology, essential for rapid scaling in response to fluctuating demands, has become the foundation for digital success. It's a strategy that empowers tech behemoths like Amazon and Facebook, along with service platforms such as DoorDash, to skillfully navigate the tidal waves of digital traffic during peak moments and ensure service to millions of users worldwide without a hitch. Consider the surge Amazon faces on Prime Day or the global traffic spike Facebook navigates during significant events. Similarly, DoorDash's quest to flawlessly handle a flood of orders showcases a recurring theme: the critical need for an architecture that scales vertically and horizontally — expanding capacity without sacrificing system integrity or the user experience. In the current landscape, where startups frequently encounter unprecedented growth rates, the dream of scaling quickly can become a nightmare of scalability issues. Hypergrowth — a rapid expansion that surpasses expectations — presents a formidable challenge, risking a company's collapse if it fails to scale efficiently. This challenge birthed the concept of hyperscaling, emphasizing an architecture's nimbleness in adapting and growing to meet dynamic demands. Essential to this strategy is extensive parallelization and rigorous fault isolation, ensuring companies can scale without succumbing to the pitfalls of rapid growth. Cell-based architecture emerges as a beacon for applications and services where downtime is not an option. In scenarios where every second of inactivity spells significant reputational or financial loss, this architectural paradigm proves invaluable. It is especially crucial for: Applications requiring uninterrupted operation to ensure customer satisfaction and maintain business continuity. Financial services vital for maintaining economic stability. Ultra-scale systems where failure is an unthinkable option. Multi-tenant services requiring segregated resources for specific clients. This architectural innovation was developed in direct response to the increasing need for modern, rapidly expanding digital services. It provides a scalable, resilient framework supporting continuous service delivery and operational superiority. Understanding Cell-Based Architecture What Exactly Is Cell-Based Architecture? Cell-based architecture is a modern approach to creating digital services that are both scalable and resilient, taking cues from the principles of distributed systems and microservices design patterns. This architecture breaks down an extensive system into smaller, independent units called cells. Each cell is self-sufficient, containing a specific segment of the system's functionality, data storage, compute, application logic, and dependencies. This modular setup allows each cell to be scaled, deployed, and managed independently, enhancing the system's ability to grow and recover from failures without widespread impact. Drawing an analogy to urban planning, consider cell-based architecture akin to a well-designed metropolis where each neighborhood operates autonomously, equipped with its services and amenities, yet contributes to the city's overall prosperity. In times of disruption, such as a power outage or a water main break, only the affected neighborhood experiences downtime while the rest of the city thrives. Just as a single neighborhood can experience disruption without paralyzing the entire city, a cell encountering an issue in this architectural framework does not trigger a system-wide failure. This ensures the digital service remains robust and reliable, maintaining high uptime and resilience. Cell-based architecture builds scalable and robust digital services by breaking down an extensive system into smaller, independent units called cells. Each cell is self-contained with its own data storage and computing power similar to how neighborhoods work in a city. They operate independently, so if one cell has a problem, it doesn't affect the rest of the system. This design helps improve the system's stability and ability to grow without causing widespread issues. Fig. 1: Cell-Based Architecture Key Components Cell: Akin to neighborhoods, cells are the foundational building blocks of this architecture. Each cell is an autonomous microservice cluster with resources capable of handling a subset of service responsibilities. A cell is a stand-alone version of the application with its own computing power, load balancer, and databases. This setup allows each cell to operate independently, making it possible to deploy, monitor, and maintain them separately. This independence means that if one cell runs into problems, it doesn't affect the others, which helps the system to scale effectively and stay robust. Cell Router: Cell Routers play a critical role similar to a city's traffic management system. They dynamically route requests to the most appropriate cell based on factors such as load, geographic location, or specific service requirements. By efficiently balancing the load across various cells, cell routers ensure that each request is processed by the cell best suited to handle it, optimizing system performance and the user experience, much like how traffic lights and signs direct the flow of vehicles to ensure smooth transit within a city. Inter-Cell Communication Layer: Despite the autonomy of individual cells, cooperation between them is essential for handling tasks across the system. The Inter-Cell Communication Layer facilitates secure and efficient message exchange between cells. This layer acts as the public transportation system of our city analogy, connecting different neighborhoods (cells) to ensure seamless collaboration and unified service delivery across the entire architecture. It ensures that even as cells operate independently, they can still work together effectively, mirroring how different parts of a city are connected yet function cohesively. Control Plane: The control plane is a critical component of cell-based architecture, acting as the central hub for administrative operations. It oversees tasks such as setting up new cells (provisioning), shutting down existing cells (de-provisioning), and moving customers between cells (migrating). This ensures that the infrastructure remains responsive to the system's and its users' needs, allowing for dynamic resource allocation and seamless service continuity. Why and When to Use Cell-Based Architecture? Why Use It? Cell-based architecture offers a robust framework for efficiently scaling digital services, guaranteeing their resilience and adaptability during expansion. Below is a breakdown of its advantages: Higher Scalability: By defining and managing the capacity of each cell, you can add more cells to scale out (handle growth by adding more system components, such as databases and servers, and spreading the workload evenly). This avoids hitting the resource limits that come with scaling up (accommodating growth by increasing the size of a system's component, such as a database, server, or subsystem). As demand grows, you add more cells, each a contained unit with known capacities, making the system inherently scalable. Safer Deployments: Deployments and rollbacks are smoother with cells. You can deploy changes to one cell at a time, minimizing the impact of any issues. Canary cells can be used to test new deployments under actual conditions with minimal risk, providing a safety net for broader deployment. Easy Testability: Testing large, spread-out systems can be challenging, especially as they get bigger. However, with cell-based architecture, each cell is kept to a manageable size, making it much simpler to test how they behave at their largest capacity. Testing a whole big service can be too expensive and complex. However, testing just one cell is doable because you can simulate the most significant amount of work the cell can handle, similar to the most crucial job a single customer might give your application. This makes it practical and cost-effective to ensure each cell runs smoothly. Lower Blast Radius: Cell-based architecture limits the spread of failures by isolating issues within individual cells, much like neighborhoods in a city. This division ensures that a problem in one cell doesn't affect the entire system, maintaining overall functionality. Each cell operates independently, minimizing any single incident's impact area, or "blast radius," akin to the regional isolation seen in large-scale services. This setup enhances system resilience by keeping disruptions contained and preventing widespread outages.Fig. 2: Cell-based architecture services exhibit enhanced resilience to failures and feature a reduced blast radius compared to traditional services Improved Reliability and Recovery Higher Mean Time Between Failure (MTBF): Cell-based architecture increases the system's reliability by reducing how often problems occur. This design keeps each cell small and manageable, allowing for regular checks and maintenance, smoothing operations and making them more predictable. With customers distributed across different cells, any issues affect only a limited set of requests and users. Changes are tested on just a few cells at a time, making it easy to revert without widespread impact. For example, if you have customers divided across ten cells, a problem in one cell affects only 10% of your customers. This controlled approach to managing changes and addressing issues quickly means the system experiences fewer disruptions, leading to a more stable and reliable service. Lower Mean Time to Recovery (MTTR): Recovery is quicker and more straightforward with cells since you deal with a more minor, contained issue rather than a system-wide problem. Higher Availability: Cell-based architecture can lead to fewer and shorter failures, improving the overall uptime of your service. Even though there might be more potential points of failure (each cell could theoretically fail), the impact of each failure is significantly reduced, and they're easier to fix. When to Use It? Here's a brief guide to help you understand when it's advantageous to use this architectural strategy: High-Stakes Applications: If downtime could severely impact your customers, tarnish your reputation, or result in substantial financial loss, a cell-based approach can safeguard against widespread disruptions. Critical Economic Infrastructure: Cell-based architecture ensures continuous operation for financial services industries (FSI), where workloads are pivotal to economic stability. Ultra-Scale Systems: Systems too large or critical to fail—those that must maintain operation under almost any circumstance—are prime candidates for cell-based design. Stringent Recovery Objectives: Cell-based architecture offers quick recovery capabilities for workloads requiring a Recovery Point Objective (RPO) of less than 5 seconds and a Recovery Time Objective (RTO) of less than 30 seconds. Multi-Tenant Services with Dedicated Needs: For services where tenants demand fully dedicated resources, assigning them their cell ensures isolation and dedicated performance. Although cell-based architecture brings considerable benefits to handling critical workloads, it also comes with its own hurdles, such as heightened complexity, elevated costs, the necessity for specialized tools and practices, and the need for investment in a routing layer. For a more in-depth analysis of these challenges, please see the "Weighing the Scales: Benefits and Challenges." Implementing Cell-Based Architecture This section highlights critical design factors that come into play while designing and implementing a cell-based architecture. Designing a Cell Cell design is a foundational aspect of cell-based architecture, where a system is divided into smaller, self-contained units known as cells. Each cell operates independently with its resources, making the entire system more scalable and resilient. To embark on cell design, identify distinct functionalities within your system that can be isolated into individual cells. This might involve grouping services by their operational needs or user base. Once you've defined these boundaries, equip each cell with the necessary resources, such as databases and application logic, to ensure it can function autonomously. This setup facilitates targeted scaling and recovery and minimizes the impact of failures, as issues in one cell won't spill over to others. Implementing effective communication channels between cells and establishing comprehensive monitoring are crucial steps to maintain system cohesion and oversee cell performance. By systematically organizing your architecture into cells, you create a robust framework that enhances the manageability and adaptability of your system. Here are a few ideas on cell design that can be leveraged to bolster system resilience: Distribute Cells Across Availability Zones: By positioning cells across different availability zones (AZs), you can protect your system against the failure of a single data center or geographic location. This geographical distribution ensures that even if one AZ encounters issues, other cells in different AZs can continue to operate, maintaining overall system availability and reducing the risk of complete service downtime. Implement Redundant Cell Configurations: Creating redundant copies of cells within and across AZs can further enhance resilience. This redundancy means that if one cell fails, its responsibilities can be immediately taken over by a duplicate cell, minimizing service disruption. This approach requires careful synchronization between cells to ensure data consistency but significantly improves fault tolerance. Design Cells for Autonomous Operation: Ensuring that each cell can operate independently, with its own set of resources, databases, and application logic, is crucial. This independence allows cells to be isolated from failures elsewhere in the system. Even if one cell experiences a problem, it won't spread to others, localizing the impact and making it easier to identify and rectify issues. Use Load Balancers and Cell Routers Strategically: Integrating load balancers and cell routers that are aware of cell locations and health statuses can help efficiently redirect traffic away from troubled cells or AZs. This dynamic routing capability allows for real-time adjustments to traffic flow, directing users to the healthiest available cells and balancing the load to prevent overburdening any single cell or AZ. Facilitate Easy Cell Replication and Deployment: Design cells with replication and redeployment in mind. In case of a cell or AZ failure, having mechanisms for quickly spinning up new cells in alternative locations can be invaluable. Automation tools and templates for cell deployment can expedite this process, reducing recovery times and enhancing overall system resilience. Regularly Test Failover Processes: Regular testing of cell failover processes, including simulated failures and recovery drills, can ensure that your system responds as expected during actual outages. These tests can reveal potential weaknesses in your cell design and failover strategies, allowing for continuous improvement of system resilience. By incorporating these ideas into your cell design, you can create a more resilient system capable of withstanding various failure scenarios while minimizing the impact on service availability and performance. Cell Partitioning Cell partitioning is a crucial technique in cell-based architecture. It focuses on dividing a system's workload among distinct cells to optimize performance, scalability, and resilience. It involves categorizing and directing user requests or data to specific cells based on predefined criteria. This process ensures no cell becomes overwhelmed, enhancing system reliability and efficiency. How Cell Partitioning Can Be Done: Identify Partition Criteria: Determine the basis for distributing workloads among cells. Typical criteria include geographic location, user ID, request type, or date range. This step is pivotal in defining how the system categorizes and routes requests to the appropriate cells. Implement Routing Logic: Develop a routing mechanism within the cell router or API gateway that uses the identified criteria to direct incoming requests to the correct cell. This might involve dynamic decision-making algorithms that consider current cell load and availability. Continuous Monitoring and Adjustment: Regularly monitor the performance and load distribution across cells. Use this data to adjust partitioning criteria and routing logic to maintain optimal system performance and scalability. Partitioning Algorithms: Several algorithms can be utilized for effective cell partitioning, each with its strengths and tailored to different types of workloads and system requirements: Consistent Hashing: Requests are distributed based on the hash values of the partition key (e.g., user ID), ensuring even workload distribution and minimal reorganization when cells are added or removed. Range-Based Partitioning: Divides data into ranges (e.g., alphabetical or numerical) and assigns each range to a specific cell. This is ideal for ordered data, allowing efficient query operations. Round Robin: This method distributes requests evenly across all available cells in a cyclic manner. It is straightforward and helpful in achieving a basic level of load balancing. Sharding: Similar to range-based partitioning but more complex, sharding involves splitting large databases into smaller, faster, more easily managed parts, or "shards," each handled by a separate cell. Dynamic Partitioning: Adjusts partitioning in real-time based on workload characteristics or system performance metrics. This approach requires advanced algorithms capable of analyzing system states and making immediate adjustments. By thoughtfully implementing cell partitioning and choosing the appropriate algorithm, you can significantly enhance your cell-based architecture's performance, scalability, and resilience. Regular review and adjustment of your partitioning strategy ensures it continues to meet your system's evolving needs. Implementing a Cell Router In cell-based architecture, the cell router is crucial for steering traffic to the correct cells, ensuring efficient workload management and scalability. An effective cell router hinges on two key elements: traffic routing logic and failover strategies, which maintain system reliability and optimize performance. Implementing Traffic Routing Logic: Start by defining the criteria for how requests are directed to various cells, including the users' geographic location, the type of request, and the specific services needed. The aim is to reduce latency and evenly distribute the load. Employ dynamic routing that adapts to cell availability and workload changes in real time, possibly through integration with a service discovery tool that monitors each cell's status and location. Establishing Failover Strategies: Solid failover processes are essential for the cell router to ensure the system's dependability. Should any cell become unreachable, the router must automatically reroute traffic to the next available cell, requiring minimal manual intervention. This is achieved by implementing health checks across cells to swiftly identify and respond to failures, thus keeping the user experience smooth and the service highly available, even during cell outages. Fig 3. The cell router ensures a smooth user experience by redirecting traffic to healthy cells during outages, maintaining uninterrupted service availability For the practical implementation of a cell router, you can take one of the following approaches: Load Balancers: Use cloud-based load balancers that dynamically direct traffic based on specific request attributes, such as URL paths or headers, according to set rules. API Gateways: An API gateway can serve as the primary entry for all incoming requests and route them to the appropriate cell based on configured logic. Service Mesh: A service mesh offers a network layer that facilitates efficient service-to-service communications and routing requests based on policies, service discovery, and health status. Custom Router Service: Developing a custom service allows routing decisions based on detailed request content, current cell load, or bespoke business logic, offering tailored control over traffic management. Choosing the right implementation strategy for a cell router depends on specific needs, such as the granularity of routing decisions, integration capabilities with existing systems, and management simplicity. Each method provides varying degrees of control, complexity, and adaptability to cater to distinct architectural requirements. Cell Sizing Cell sizing in a cell-based architecture refers to determining each cell's optimal size and capacity to ensure it can handle its designated workload effectively without overburdening. Proper cell sizing is crucial for several reasons: Balanced Load Distribution: Correctly sized cells help achieve a balanced distribution of workloads across the system, preventing any single cell from becoming a bottleneck. Scalability: Well-sized cells can scale more efficiently. As demand increases, the system can add more cells or adjust resources within existing cells to accommodate growth. Resilience and Recovery: Smaller, well-defined cells can isolate failures more effectively, limiting the impact of any single point of failure. This makes the system more resilient and simplifies recovery processes. Cost Efficiency: Optimizing cell size helps utilize resources more efficiently, avoiding unnecessary expenditure on underutilized capacities. How Cell Sizing Is Done? Cell sizing involves a careful analysis of several factors: Workload Analysis: Understand the nature and volume of each cell's workload. This includes peak demand times, data throughput, and processing requirements. Resource Requirements: Based on the workload analysis, estimate the resources (CPU, memory, storage) each cell needs to operate effectively under various conditions. Performance Metrics: Consider key performance indicators (KPIs) that define successful cell operation. This could include response times, error rates, and throughput. Scalability Goals: Define how the system should scale in response to increased demand. This will influence whether cells should be designed to scale up (increase resources in a cell) or scale out (add more cells). Testing and Adjustment: Validate cell size assumptions by testing under simulated workload conditions. Monitoring real-world performance and adjusting as needed is a continuous part of cell sizing. Effective cell sizing often involves a combination of theoretical analysis and empirical testing. Starting with a best-guess estimate based on workload characteristics and adjusting based on observed performance ensures that cells remain efficient, responsive, and cost-effective as the system evolves. Cell Deployment Cell deployment in a cell-based architecture is the process of distributing and managing your application's workload across multiple self-contained units called cells. This strategy ensures scalability, resilience, and efficient resource use. Here's a concise guide on how it's typically done and the technology choices available for effective implementation. How Is Cell Deployment Done? Automated Deployment Pipelines: Start by setting up automated deployment pipelines. These pipelines handle your application's packaging, testing, and deployment to various cells. Automation ensures consistency, reduces errors, and enables rapid deployment across cells. Blue/Green Deployments: Use blue/green deployment strategies to minimize downtime and reduce risk. By deploying the new version of your application to a separate environment (green) while keeping the current version (blue) running, you can switch traffic to the latest version once it's fully ready and tested. Canary Releases: Gradually roll out updates to a small subset of cells or users before making them available system-wide. This allows you to monitor the impact of changes and roll them back if necessary without affecting all users. Technology Choices for Cell Deployment: Container Orchestration Tools: Tools such as Kubernetes, AWS ECS, and Docker Swarm are crucial for orchestrating cell deployments, enabling the encapsulation of applications into containers for streamlined deployment, scaling, and management across various cells. CI/CD Tools: Continuous Integration and Continuous Deployment (CI/CD) tools such as Jenkins, GitLab CI, CircleCI, and AWS Pipeline facilitate the automation of testing and deployment processes, ensuring that new code changes can be efficiently rolled out. Infrastructure as Code (IaC): Tools like Terraform and AWS CloudFormation allow you to define your infrastructure in code, making it easier to replicate and deploy cells across different environments or cloud providers. Service Meshes: Service meshes like Istio or Linkerd provide advanced traffic management capabilities, including canary deployments and service discovery, which are crucial for managing communication and cell updates. By leveraging these deployment strategies and technologies, you can achieve a high degree of automation and control in your cell deployments, ensuring your application remains scalable, reliable, and easy to manage. Cell Observability Cell observability is crucial in a cell-based architecture to ensure you have comprehensive visibility into each cell's health, performance, and operational metrics. It allows you to monitor, troubleshoot, and optimize the system effectively, enhancing overall reliability and user experience. Implementing Cell Observability: To achieve thorough cell observability, focus on three key areas: logging, monitoring, and tracing. Logging captures detailed events and operations within each cell. Monitoring tracks key performance indicators and health metrics in real time. Tracing follows requests as they move through the cells, identifying bottlenecks or failures in the workflow. Technology Choices for Cell Observability: Logging Tools: Solutions like Elasticsearch, Logstash, Kibana (ELK Stack), or Splunk provide powerful logging capabilities, allowing you to aggregate and analyze logs from all cells centrally. Monitoring Solutions: Prometheus, coupled with Grafana for visualization, offers robust monitoring capabilities with support for custom metrics. Cloud-native services like Amazon CloudWatch or Google Operations (formerly Stackdriver) provide integrated monitoring solutions tailored for applications deployed on their respective cloud platforms. Distributed Tracing Systems: Tools like Jaeger, Zipkin, and AWS XRay enable distributed tracing, helping you to understand the flow of requests across cells and identify latency issues or failures in microservices interactions. Service Meshes: Service meshes such as Istio or Linkerd inherently offer observability features, including monitoring, logging, and tracing requests between cells without requiring changes to your application code. By leveraging these tools and focusing on comprehensive observability, you can ensure that your cell-based architecture remains performant, resilient, and capable of supporting your application's dynamic needs. Weighing the Scales: Benefits and Challenges Adopting Cell-Based Architecture transforms the structural and operational dynamics of digital services. Breaking down a service into independently scalable and resilient units (cells) offers a robust framework for managing complexity and ensuring system availability. However, this architectural paradigm also introduces new challenges and complexities. Here's a deeper dive into the technical advantages and considerations. Benefits Horizontal Scalability: Unlike traditional scale-up approaches, Cell-Based Architecture enables horizontal scaling by adding more cells. This method alleviates common bottlenecks associated with centralized databases or shared resources, allowing for linear scalability as user demand increases. Fault Isolation and Resilience: The architecture's compartmentalized design ensures that failures are contained within individual cells, significantly reducing the system's overall blast radius. This isolation enhances the system's resilience, as issues in one cell can be mitigated or repaired without impacting the entire service. Deployment Agility: Leveraging cells allows for incremental deployments and feature rollouts, akin to implementing rolling updates across microservices. This granularity in deployment strategy minimizes downtime and enables a more flexible response to market or user demands. Simplified Operational Complexity: While the initial setup is complex, the ongoing operation and management of cells can be more straightforward than monolithic architectures. Each cell's autonomy simplifies monitoring, troubleshooting, and scaling efforts, as operational tasks can be executed in parallel across cells. Challenges (Considerations) Architectural Complexity: Transitioning to or implementing Cell-Based Architecture demands a meticulous design phase, focusing on defining cell boundaries, data partitioning strategies, and inter-cell communication protocols. This complexity requires a deep understanding of distributed systems principles and may necessitate a development and operational practices shift. Resource and Infrastructure Overhead (Higher Cost): Each cell operates with its set of resources and infrastructure, potentially leading to increased overhead compared to shared-resource models. Optimizing resource utilization and cost-efficiency becomes paramount, especially as the number of cells grows. Inter-Cell Communication Management: Ensuring coherent and efficient communication between cells without introducing tight coupling or significant latency is a critical challenge. Developers must design a communication layer that supports the necessary interactions while maintaining cells' independence and avoiding performance bottlenecks. Data Consistency and Synchronization: Maintaining data consistency across cells, especially in scenarios requiring global state or real-time data synchronization, adds another layer of complexity. Implementing strategies like event sourcing, CQRS (Command Query Responsibility Segregation), or distributed sagas may be necessary to address these challenges. Specialized Tools and Practices: Operating a cell-based architecture requires specialized operational tools and practices to effectively manage multiple instances of workloads. Routing Layer Investment: A robust cell routing layer is essential for directing traffic appropriately across cells, necessitating additional investment in technology and expertise. Navigating the Trade-offs Opting for Cell-Based Architecture involves navigating these trade-offs and evaluating whether scalability, resilience, and operational agility benefits outweigh the complexities of implementation and management. It is most suitable for services requiring high availability, those undergoing rapid expansion, or systems where modular scaling and failure isolation are critical. Best Practices and Pitfalls Best Practices Adopting a cell-based architecture can significantly enhance the scalability and resilience of your applications. Here are streamlined best practices for implementing this approach effectively: Begin With a Solid Foundation Treat Your Current Setup as Cell Zero: Viewing your existing system as the initial cell, gradually introducing traffic routing and distribution across new cells. Launch with Multiple Cells: Implement more than one cell from the beginning to quickly learn and adapt to the operational dynamics of a cell-based environment. Plan for Flexibility and Growth Implement a Cell Migration Mechanism Early: Prepare for the need to move customers between cells, ensuring you can scale and adjust without disruption. Focus on Reliability Conduct a Failure Mode Analysis: Identify and assess potential failures within each cell and their impact, developing strategies to ensure robustness and minimize cross-cell effects. Ensure Independence and Security Maintain Cell Autonomy: Design cells to be self-sufficient, with dedicated resources and clear ownership, possibly by a single team. Secure Communication: Use versioned, well-defined APIs for cell interactions and enforce security policies at the API gateway level. Minimize Dependencies: Keep inter-cell dependencies low to preserve the architecture's benefits, such as fault isolation. Optimize Deployment and Operations Avoid Shared Resources: Each cell should have its data storage to eliminate global state dependencies. Deploy in Waves: Introduce updates and deployments in phases across cells for better change management and quick rollback capabilities. By following these practices, you can leverage cell-based architecture to create scalable, resilient, but also manageable, and secure systems ready to meet the challenges of modern digital demands. Common Pitfalls While cell-based architecture offers significant advantages for scalability and resilience, it also introduces specific challenges and pitfalls that organizations need to be aware of when adopting this approach: Complexity in Management and Operations Increased Operational Overhead: Managing multiple cells can introduce complexity in deployment, monitoring, and operations, requiring robust automation and orchestration tools to maintain efficiency. Consistency Management: Ensuring data consistency across cells, especially in stateful applications, can be challenging and might require sophisticated synchronization mechanisms. Initial Setup and Migration Challenges Complex Migration Process: Transitioning to a cell-based architecture from a traditional setup can be complex, requiring careful planning to avoid service disruption and data loss. Steep Learning Curve: Teams may face a learning curve in understanding cell-based concepts and best practices, necessitating training and potentially slowing initial progress. Design and Architectural Considerations Cell Isolation: Properly isolating cells to prevent failure propagation requires meticulous design, failing which the system might not fully realize the benefits of fault isolation. Optimal Cell Size: Determining the optimal size for cells can be tricky, as overly small cells may lead to inefficiencies, while huge cells might compromise scalability and resilience. Resource Utilization and Cost Implications Potential for Increased Costs: If not carefully managed, the duplication of resources across cells can lead to increased operational costs. Underutilization of Resources: Balancing resource allocation to prevent underutilization while avoiding over-provisioning requires continuous monitoring and adjustment. Networking and Communication Overhead Network Complexity: The cell-based architecture may introduce additional network complexity, including the need for sophisticated routing and load-balancing strategies. Inter-Cell Communication: Ensuring efficient and secure communication between cells, especially in geographically distributed setups, can introduce latency and requires safe, reliable networking solutions. Security and Compliance Security Configuration: Each cell's need for individual security configurations can complicate enforcing consistent security policies across the architecture. Compliance Verification: Verifying that each cell complies with regulatory requirements can be more challenging in a distributed architecture, requiring robust auditing mechanisms. Scalability vs. Cohesion Trade-Off Dependency Management: While minimizing dependencies between cells enhances fault tolerance, it can also lead to challenges in maintaining application cohesion and consistency. Data Duplication: Avoiding shared resources may result in data duplication and synchronization challenges, impacting system performance and consistency. Organizations should invest in robust planning, adopt comprehensive automation and monitoring tools, and ensure ongoing team training to mitigate these pitfalls. Understanding these challenges upfront can help design a more resilient, scalable, and efficient cell-based architecture. Cell-Based Wins in the Real World Cell-based architecture has become essential for managing scalability and ensuring system resilience, from high-growth startups to tech giants like Amazon and Facebook. This architectural model has been adopted across various industries, reflecting its effectiveness in handling large-scale, critical workloads. Here's a brief look at how DoorDash and Slack have implemented cell-based architecture to address their unique challenges. DoorDash's Transition to Cell-Based Architecture Faced with the demands of hypergrowth, DoorDash migrated from a monolithic system to a cell-based architecture, marking a pivotal shift in its operational strategy. This transition, known as Project SuperCell, was driven by the need to efficiently manage fluctuating demand and maintain consistent service reliability across diverse markets. By leveraging AWS's cloud infrastructure, DoorDash was able to isolate failures within individual cells, preventing widespread system disruptions. It significantly enhanced their ability to scale resources and maintain service reliability, even during peak times, demonstrating the transformative potential of adopting a cell-based approach. Slack's Migration to Cell-Based Architecture Slack underwent a major shift to a cell-based architecture to lessen the impact of gray failures and boost service redundancy. Prompted by a review of a network outage, this move revealed the risks of depending solely on a single availability zone. The new cellular structure aims to confine failures more effectively and minimize the extent of potential site outages. With the adoption of isolated services in each availability zone, Slack has enabled its internal services to function independently within each zone, curtailing the fallout from outages and speeding up the recovery process. This significant redesign has markedly improved Slack's system resilience, underscoring cell-based architecture's role in ensuring high service availability and quality. Roblox's Strategic Shift to Cellular Infrastructure Roblox's shift to a cell-based architecture showcases its response to rapid growth and the need to support over 70 million daily active users with reliable, low-latency experiences. Roblox created isolated clusters within their data centers by adopting a cellular infrastructure, enhancing system resilience through service replication across cells. This setup allowed for the deactivation of non-functional cells without disrupting service, effectively containing failures. The move to cellular infrastructure has significantly boosted Roblox's system reliability, enabling the platform to offer always-on, immersive experiences worldwide. This strategy highlights the effectiveness of cell-based architecture in managing large-scale, dynamic workloads and maintaining high service quality as platforms expand. These examples from DoorDash, Slack, and Roblox illustrate the strategic value of cell-based architecture in addressing the challenges of scale and reliability. By isolating workloads into independent cells, these companies have achieved greater scalability, fault tolerance, and operational efficiency, showcasing the effectiveness of this approach in supporting dynamic, high-demand services. Key Takeaways Cell-based architecture represents a transformative approach for organizations aiming to achieve hyper-scalability and resilience in the digital era. Companies like Amazon, Facebook, DoorDash, and Slack have demonstrated their efficacy in managing hypergrowth and ensuring uninterrupted service by segmenting systems into independent, self-sufficient cells. This architectural strategy facilitates dynamic scaling and robust fault isolation and demands careful consideration of increased complexity, resource allocation, and the need for specialized operational tools. As businesses continue to navigate the demands of digital growth, the adoption of cell-based architecture emerges as a strategic solution for sustaining operational integrity and delivering consistent user experiences amidst the ever-evolving digital landscape. Acknowledgments This article draws upon the collective knowledge and experiences of industry leaders and practitioners, including insights from technical blogs, case studies from companies like Amazon, Slack, and Doordash, and contributions from the wider tech community. References https://docs.aws.amazon.com/wellarchitected/latest/reducing-scope-of-impact-with-cell-based-architecture/reducing-scope-of-impact-with-cell-based-architecture.html https://github.com/wso2/reference-architecture/blob/master/reference-architecture-cell-based.md https://newsletter.systemdesign.one/p/cell-based-architecture https://highscalability.com/cell-architectures/ https://www.youtube.com/watch?v=ReRrhU-yRjg https://slack.engineering/slacks-migration-to-a-cellular-architecture/ https://blog.roblox.com/2023/12/making-robloxs-infrastructure-efficient-resilient/
Services, or servers, are software components or processes that execute operations on specified inputs, producing either actions or data depending on their purpose. The party making the request is the client, while the server manages the request process. Typically, communication between client and server occurs over a network, utilizing protocols such as HTTP for REST or gRPC. Services may include a User Interface (UI) or function solely as backend processes. With this background, we can explore the steps and rationale behind developing a scalable service. NOTE: This article does not provide instructions on service or UI development, leaving you the freedom to select the language or tech stack that suits your requirements. Instead, it offers a comprehensive perspective on constructing and expanding a service, reflecting what startups need to do in order to scale a service. Additionally, it's important to recognize that while this approach offers valuable insights into computing concepts, it's not the sole method for designing systems. The Beginning: Version Control Assuming clarity on the presence of a UI and the general purpose of the service, the initial step prior to service development involves implementing a source control/version control system to support the code. This typically entails utilizing tools like Git, Mercurial, or others to back up the code and facilitate collaboration, especially as the number of contributors grows. It's common for startups to begin with Git as their version control system, often leveraging platforms like github.com for hosting Git repositories. An essential element of version control is pull requests, facilitating peer reviews within your team. This process enhances code quality by allowing multiple individuals to review and approve proposed changes before integration. While I won't delve into specifics here, a quick online search will provide ample information on the topic. Developing the Service Once version control is established, the next step involves setting up a repository and initiating service development. This article adopts a language-agnostic approach, as delving into specific languages and optimal tech stacks for every service function would be overly detailed. For conciseness, let's focus on a service that executes functions based on inputs and necessitates backend storage (while remaining neutral on the storage solution, which will be discussed later). As you commence service development, it's crucial to grasp how to run it locally on your laptop or in any developer environment. One should consider this aspect carefully, as local testing plays a pivotal role in efficient development. While crafting the service, ensure that classes, functions, and other components are organized in a modular manner, into separate files as necessary. This organizational approach promotes a structured repository and facilitates comprehensive unit test coverage. Unit tests represent a critical aspect of testing that developers should rigorously prioritize. There should be no compromises in this regard! Countless incidents or production issues could have been averted with the implementation of a few unit tests. Neglecting this practice can potentially incur significant financial costs for a company. I won't delve into the specifics of integrating the gRPC framework, REST packages, or any other communication protocols. You'll have the freedom to explore and implement these as you develop the service. Once the service is executable and tested through unit tests and basic manual testing, the next step is to explore how to make it "deployable." Packaging the Service Ensuring the service is "deployable" implies having a method to run the process in a more manageable manner. Let's delve into this concept further. What exactly does this entail? Now that we have a runnable process, who will initiate it initially? Moreover, where will it be executed? Addressing these questions is crucial, and we'll now proceed to provide answers. In my humble opinion, managing your own compute infrastructure might not be the best approach. There are numerous intricacies involved in ensuring that your service is accessible on the Internet. Opting for a cloud service provider (CSP) is a wiser choice, as they handle much of the complexity behind the scenes. For our purposes, any available cloud service provider will suffice. Once a CSP is selected, the next consideration is how to manage the process. We aim to avoid manual intervention every time the service crashes, especially without notification. The solution lies in orchestrating our process through containerization. This involves creating a container image for our process, essentially a filesystem containing all necessary dependencies at the application layer. A "Dockerfile" is used to specify the steps for including the process and dependencies in the container image. Upon completion of the Dockerfile, the docker build cli can be used to generate an image with tags. This image is then stored locally or pushed to a container registry, serving as a repository for container images that can later be pulled onto a compute instance. With these steps outlined, the next question arises: how does containerization orchestrate our process? This will be addressed in the following section on executing a container. Executing the Container After building a container image, the subsequent step is its execution, which in turn initiates the service we've developed. Various container runtimes, such as containerd, podman, and others, are available to facilitate this process. In this context, we utilize the "docker" cli to manage the container, which interacts with containerd in the background. Running a container is straightforward: "docker run" executes the container and consequently, the developed process. You may observe logs in the terminal (if not run as a daemon) or use "docker logs" to inspect service logs if necessary. Additionally, options like "--restart" can be included in the command to automatically restart the container (i.e., the process) in the event of a crash, allowing for customization as required. At this stage, we have our process encapsulated within a container, ready for execution/orchestration as required. While this setup is suitable for local testing, our next step involves exploring how to deploy this on a basic compute instance within our chosen CSP. Deploying the Container Now that we have a container, it's advisable to publish it to a container registry. Numerous container registries are available, managed by CSPs or docker itself. Once the container is published, it becomes easily accessible from any CSP or platform. We can pull the image and run it on a compute instance, such as a Virtual Machine (VM), allocated within the CSP. Starting with this option is typically the most cost-effective and straightforward. While we briefly touch on other forms of compute infrastructure later in this article, deploying on a VM involves pulling a container image and running it, much like we did in our developer environment. Voila! Our service is deployed. However, ensuring accessibility to the world requires careful consideration. While directly exposing the VM's IP to the external world may seem tempting, it poses security risks. Implementing TLS for security is crucial. Instead, a better approach involves using a reverse proxy to route requests to specific services. This ensures security and facilitates the deployment of multiple services on the same VM. To enable internet access to our service, we require a method for inbound traffic to reach our VM. An effective solution involves installing a reverse proxy like Nginx directly on the VM. This can be achieved by pulling the Nginx container image, typically labeled as "nginx:latest". Before launching the container, it's necessary to configure Nginx settings such as servers, locations, and additional configurations. Security measures like TLS can also be implemented for enhanced protection. Once the Nginx configuration is established, it can be exposed to the container through volumes during container execution. This setup allows the reverse proxy to effectively route incoming requests to the container running on the same VM, using a specified port. One notable advantage is the ability to host multiple services within the VM, with routing efficiently managed by the reverse proxy. To finalize the setup, we must expose the VM's IP address and proxy port to the internet, with TLS encryption supported by the reverse proxy. This configuration adjustment can typically be configured through the CSP's settings. NOTE: The examples of solutions provided below may reference GCP as the CSP. This is solely for illustrative purposes and should not be interpreted as a recommendation. The intention is solely to convey concepts effectively. Consider the scenario where managing a single VM manually becomes laborious and lacks scalability. To address this challenge, CSPs offer solutions akin to managed instance groups, comprising multiple VMs configured identically. These groups often come with features like startup scripts, which execute upon VM initialization. All the configurations discussed earlier can be scripted into these startup scripts, simplifying the process of VM launch and enhancing scalability. This setup proves beneficial when multiple VMs are required to handle requests efficiently. Now, the question arises: when dealing with multiple VMs, how do we decide where to route requests? The solution is to employ a load balancer provided by the CSP. This load balancer selects one VM from the pool to handle each request. Additionally, we can streamline the process by implementing general load balancing. To remove individual reverse proxies, we can utilize multiple instance groups for every service needed, accompanied by load balancers for each. The general load balancer can expose its IP with TLS configuration and route setup, ensuring that only service containers run on the VM. It's essential to ensure that VM IPs and ports are accessible solely by the load balancer in the ingress path, a task achievable through configurations provided by the CSP. At this juncture, we have a load balancer securely managing requests, directing them to the specific container service within a VM from a pool of VMs. This setup itself contributes to scaling our service. To further enhance scalability and eliminate the need for continuous VM operation, we can opt for an autoscaler policy. This policy dynamically scales the VM group up or down based on parameters such as CPU, memory, or others provided by the CSP. Now, let's delve into the concept of Infrastructure as Code (IaC), which holds significant importance in efficiently managing CSP components that promote scale. Essentially, IaC involves managing CSP infrastructure components through configuration files, interpreted by an IaC tool (like Terraform) to manage CSP infrastructure accordingly. For more detailed information, refer to the wiki. Datastore We've previously discussed scaling our service, but it's crucial to remember that there's typically a requirement to maintain a state somewhere. This is where databases or datastores play a pivotal role. From experience, handling this aspect can be quite tricky, and I would once again advise against developing a custom solution. CSP solutions are ideally suited for this purpose. CSPs generally handle the complexity associated with managing databases, addressing concepts such as master-slave architecture, replica management, synchronous-asynchronous replication, backups/restores, consistency, and other intricate aspects more effectively. Managing a database can be challenging due to concerns about data loss arising from improper configurations. Each CSP offers different database offerings, and it's essential to consider the specific use cases the service deals with to choose the appropriate offering. For instance, one may need to decide between using a relational database offering versus a NoSQL offering. This article does not delve into these differences. The database should be accessible from the VM group and serve as a central datastore for all instances where the state is shared. It's worth noting that the database or datastore should only be accessible within the VPC, and ideally, only from the VM group. This is crucial to prevent exposing the ingress IP for the database, ensuring security and data integrity. Queues In service design, we often encounter scenarios where certain tasks need to be performed asynchronously. This means that upon receiving a request, part of the processing can be deferred to a later time without blocking the response to the client. One common approach is to utilize databases as queues, where requests are ordered by some identifier. Alternatively, CSP services such as Amazon SQS or GCP pub/sub can be employed for this purpose. Messages published to the queue can then be retrieved for processing by a separate service that listens to the queue. However, we won't delve into the specifics here. Monitoring In addition to the VM-level monitoring typically provided by the CSP, there may be a need for more granular insights through service-level monitoring. For instance, one might require latency metrics for database requests, metrics based on queue interactions, or metrics for service CPU and memory utilization. These metrics should be collected and forwarded to a monitoring solution such as Datadog, Prometheus, or others. These solutions are typically backed by a time-series database (TSDB), allowing users to gain insights into the system's state over a specific period of time. This monitoring setup also facilitates debugging certain types of issues and can trigger alerts or alarms if configured to do so. Alternatively, you can set up your own Prometheus deployment, as it is an open-source solution. With the aforementioned concepts, it should be feasible to deploy a scalable service. This level of scalability has proven sufficient for numerous startups that I have provided consultation for. Moving forward, we'll explore the utilization of a "container orchestrator" instead of deploying containers in VMs, as described earlier. In this article, we'll use Kubernetes (k8s) as an example to illustrate this transition. Container Orchestration: Enter Kubernetes (K8s) Having implemented the aforementioned design, we can effectively manage numerous requests to our service. Now, our objective is to achieve decoupling to further enhance scalability. This decoupling is crucial because a bug in any service within a VM could lead to the VM crashing, potentially causing the entire ecosystem to fail. Moreover, decoupled services can be scaled independently. For instance, one service may have sufficient scalability and effectively handle requests, while another may struggle with the load. Consider the example of a shopping website where the catalog may receive significantly more visits than the checkout page. Consequently, the scale of read requests may far exceed that of checkouts. In such cases, deploying multiple service containers into Kubernetes (K8s) as distinct services allows for independent scaling. Before delving into specifics, it's worth noting that CSPs offer Kubernetes as a compute platform option, which is essential for scaling to the next level. Kubernetes (K8s) We won't delve into the intricacies of Kubernetes controllers or other aspects in this article. The information provided here will suffice to deploy a service on Kubernetes. Kubernetes (K8s) serves as an abstraction over a cluster of nodes with storage and compute resources. Depending on where the service is scheduled, the node provides the necessary compute and storage capabilities. Having container images is essential for deploying a service on Kubernetes (K8s). Resources in K8s are represented by creating configurations, which can be in YAML or JSON format, and they define specific K8s objects. These objects belong to a particular "namespace" within the K8s cluster. The basic unit of compute within K8s is a "Pod," which can run one or more containers. Therefore, a config for a pod can be created, and the service can then be deployed onto a namespace using the K8s CLI, kubectl. Once the pod is created, your service is essentially running, and you can monitor its state using kubectl with the namespace as a parameter. To deploy multiple pods, a "deployment" is required. Kubernetes (K8s) offers various resources such as deployments, stateful sets, and daemon sets. The K8s documentation provides sufficient explanations for these abstractions, we won't discuss each of them here. A deployment is essentially a resource designed to deploy multiple pods of a similar kind. This is achieved through the "replicas" option in the configuration, and you can also choose an update strategy according to your requirements. Selecting the appropriate update strategy is crucial to ensure there is no downtime during updates. Therefore, in our scenario, we would utilize a deployment for our service that scales to multiple pods. When employing a Deployment to oversee your application, Pods can be dynamically generated and terminated. Consequently, the count and identities of operational and healthy Pods may vary unpredictably. Kubernetes manages the creation and removal of Pods to sustain the desired state of your cluster, treating Pods as transient resources with no assured reliability or durability. Each Pod is assigned its own IP address, typically managed by network plugins in Kubernetes. As a result, the set of Pods linked with a Deployment can fluctuate over time, presenting a challenge for components within the cluster to consistently locate and communicate with specific Pods. This challenge is mitigated by employing a Service resource. After establishing a service object, the subsequent topic of discussion is Ingress. Ingress is responsible for routing to multiple services within the cluster. It facilitates the exposure of HTTP, HTTPS, or even gRPC routes from outside the cluster to services within it. Traffic routing is managed by rules specified on the Ingress resource, which is supported by a load balancer operating in the background. With all these components deployed, our service has attained a commendable level of scalability. It's worth noting that the concepts discussed prior to entering the Kubernetes realm are mirrored here in a way — we have load balancers, containers, and routes, albeit implemented differently. Additionally, there are other objects such as Horizontal Pod Autoscaler (HPA) for scaling pods based on memory/CPU utilization, and storage constructs like Persistent volumes (PV) or Persistent Volume Claims (PVC), which we won't delve into extensively. Feel free to explore these for a deeper understanding. CI/CD Lastly, I'd like to address an important aspect of enhancing developer efficiency: Continuous Integration/Deployment (CI/CD). Continuous Integration (CI) involves running automated tests (such as unit, end-to-end, or integration tests) on any developer pull request or check-in to the version control system, typically before merging. This helps identify regressions and bugs early in the development process. After merging, CI generates images and other artifacts required for service deployment. Tools like Jenkins (Jenkins X), Tekton, Git actions and others facilitate CI processes. Continuous Deployment (CD) automates the deployment process, staging different environments for deployment, such as development, staging, or production. Usually, the development environment is deployed first, followed by running several end-to-end tests to identify any issues. If everything functions correctly, CD proceeds to deploy to other environments. All the aforementioned tools also support CD functionalities. CI/CD tools significantly improve developer efficiency by reducing manual work. They are essential to ensure developers don't spend hours on manual tasks. Additionally, during manual deployments, it's crucial to ensure no one else is deploying to the same environment simultaneously to avoid conflicts, a concern that can be addressed effectively by our CD framework. There are other aspects like dynamic config management and securely storing secrets/passwords and logging system, though we won't delve into details, I would encourage readers to look into the links provided. Thank you for reading!
Building scalable systems using microservices architecture is a strategic approach to developing complex applications. Microservices allow teams to deploy and scale parts of their application independently, improving agility and reducing the complexity of updates and scaling. This step-by-step guide outlines the process of creating a microservices-based system, complete with detailed examples. 1. Define Your Service Boundaries Objective Identify the distinct functionalities within your system that can be broken down into separate, smaller services. Example Consider an e-commerce platform. Key functionalities that can be microservices include: User management service: Handles user registration, authentication, and profile management. Product catalog service: Manages product listings, categories, and inventory. Order processing service: Takes care of orders, payments, and shipping. 2. Choose Your Technology Stack Objective Select the technologies and frameworks that will be used to develop each microservice. Example User management service: Node.js with Express for RESTful API development. Product catalog service: Python with Flask and SQLAlchemy for database interactions. Order processing service: Java with Spring Boot for leveraging enterprise-grade features. 3. Setup Development Environment Objective Prepare the local and shared development environments for your team. Example Use Docker containers for each microservice to ensure consistency between development, testing, and production environments. Docker Compose can help manage multi-container setups. Isolate Development Environments Docker Containers Utilize Docker to containerize each microservice. Containers package the microservice with its dependencies, ensuring consistency across development, testing, and production environments. This isolation helps in eliminating the "it works on my machine" problem by providing a uniform platform for all services. Docker Compose For local development, Docker Compose can be used to define and run multi-container Docker applications. With Compose, you can configure your application’s services, networks, and volumes in a single YAML file, making it easier to launch your entire microservices stack with a single command (docker-compose up). Version Control and Repository Management Git Adopt Git for version control, allowing developers to work on features in branches, merge changes, and track the history of your microservices independently. This practice supports the microservices philosophy of decentralized data management. Repository Per Service Consider maintaining a separate repository for each microservice. This approach enhances modularity and allows each team or developer working on a service to operate autonomously. 4. Implement the Microservices Implementing microservices involves developing individual services that are part of a larger, distributed system. Each microservice is responsible for a specific business capability and can be developed, deployed, and scaled independently. This section provides a detailed overview of implementing microservices with practical examples, focusing on a hypothetical e-commerce platform comprised of User Management, Product Catalog, and Order Processing services. User Management Service Objective: Handle user registration, authentication, and profile management. Technology Stack: Node.js with Express framework for building RESTful APIs. Example Implementation: JavaScript const express = require('express'); const bcrypt = require('bcrypt'); const jwt = require('jsonwebtoken'); const app = express(); app.use(express.json()); // User registration endpoint app.post('/register', async (req, res) => { const { username, password } = req.body; const hashedPassword = await bcrypt.hash(password, 10); // Save the user with the hashed password in the database // Placeholder for database operation res.status(201).send({ message: 'User registered successfully' }); }); // User login endpoint app.post('/login', async (req, res) => { const { username, password } = req.body; // Placeholder for fetching user from the database const user = { username }; // Mock user const isValid = await bcrypt.compare(password, user.password); if (isValid) { const token = jwt.sign({ username }, 'secretKey', { expiresIn: '1h' }); res.status(200).send({ token }); } else { res.status(401).send({ message: 'Invalid credentials' }); } }); app.listen(3000, () => console.log('User Management Service running on port 3000')); Product Catalog Service Objective: Manage product listings, categories, and inventory. Technology Stack: Python with Flask for simplicity and SQLAlchemy for ORM. Example Implementation: Python from flask import Flask, request, jsonify from flask_sqlalchemy import SQLAlchemy app = Flask(__name__) app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///products.db' db = SQLAlchemy(app) class Product(db.Model): id = db.Column(db.Integer, primary_key=True) name = db.Column(db.String(50), nullable=False) price = db.Column(db.Float, nullable=False) @app.route('/products', methods=['POST']) def add_product(): data = request.get_json() new_product = Product(name=data['name'], price=data['price']) db.session.add(new_product) db.session.commit() return jsonify({'message': 'Product added successfully'}), 201 @app.route('/products', methods=['GET']) def get_products(): products = Product.query.all() output = [] for product in products: product_data = {'name': product.name, 'price': product.price} output.append(product_data) return jsonify({'products': output}) if __name__ == '__main__': app.run(debug=True) Order Processing Service Objective: Handle orders, payments, and shipping. Technology Stack: Java with Spring Boot for leveraging enterprise-grade features. Example Implementation: Java @RestController @RequestMapping("/orders") public class OrderController { @Autowired private OrderRepository orderRepository; @PostMapping public ResponseEntity<?> placeOrder(@RequestBody Order order) { orderRepository.save(order); return ResponseEntity.ok().body("Order placed successfully"); } @GetMapping("/{userId}") public ResponseEntity<List<Order>> getUserOrders(@PathVariable Long userId) { List<Order> orders = orderRepository.findByUserId(userId); return ResponseEntity.ok().body(orders); } } 5. Database Design Objective Design and implement databases for each microservice independently. Microservices often necessitate different database technologies based on their unique data requirements, a concept known as polyglot persistence. Example User Management Service: Use a PostgreSQL database with tables for `Users` and `Roles`. PostgreSQL: A powerful, open-source object-relational database system that offers reliability, feature robustness, and performance. Ideal for services requiring complex queries and transactions, such as the User Management Service. Product Catalog Service: Opt for a MongoDB database to store `Products` and `Categories`documents. MongoDB: A NoSQL document database known for its flexibility and scalability. It's well-suited for the Product Catalog Service, where the schema-less nature of MongoDB accommodates diverse product information. Order Processing Service: Implement a MySQL database with tables for `Orders`, `OrderItems`, and `ShippingDetails`. MySQL: Another popular open-source relational database, MySQL is known for its reliability and is widely used in web applications. The Order Processing Service, which might involve structured data with relationships (orders, order items, etc.), could leverage MySQL's capabilities. Microservices often necessitate different database technologies based on their unique data requirements, a concept known as polyglot persistence. 6. Communication Between Microservices Objective Establish methods for inter-service communication while maintaining loose coupling. Example Use asynchronous messaging for communication between services to enhance scalability and resilience. For instance, when an order is placed, the Order Processing Service publishes an `OrderPlaced` event to a message broker (e.g., RabbitMQ or Apache Kafka), which the Inventory Service subscribes to for updating stock levels. Example: Order Processing System Consider an e-commerce platform with microservices for user management, product catalog, order processing, and inventory management. When a customer places an order, the order processing service needs to interact with both the inventory and user management services to complete the order. This example outlines how asynchronous communication via an event-driven approach can facilitate these interactions. Services User management service: Manages user information and authentication. Inventory management service: Keeps track of product stock levels. Order processing service: Handles order creation, updates, and status tracking. Scenario: Placing an Order Customer places an order: The customer adds items to their cart and initiates the checkout process. Order Processing Service Receives the order request and generates a new order with a pending status. It then publishes an OrderCreated event to a message broker (e.g., Apache Kafka) containing the order details, including product IDs and quantities. JSON { "eventType": "OrderCreated", "orderId": "12345", "userId": "67890", "items": [ {"productId": "111", "quantity": 2}, {"productId": "222", "quantity": 1} ] } Inventory Management Service Subscribes to the OrderCreated event. Upon receiving an event, it checks if the inventory can fulfill the order. If so, it updates the inventory accordingly and publishes an OrderConfirmed event. If not, it publishes an OrderFailed event. JSON { "eventType": "OrderConfirmed", "orderId": "12345" } Order Processing Service Subscribes to both OrderConfirmed and OrderFailed events. Depending on the event type received, it updates the order status to either confirmed or failed and notifies the customer. User Management Service Although not directly involved in this scenario, could subscribe to order events to update user metrics or trigger loyalty program updates. 7. Deploy and Monitor Deploying microservices involves several critical steps, from containerization to orchestration and monitoring. This guide will outline a comprehensive approach to deploying your microservices, ensuring they are scalable, maintainable, and resilient. We'll use Docker for containerization and Kubernetes for orchestration, given their widespread adoption and robust ecosystem. Containerize Your Microservices Objective Package each microservice into its own container to ensure consistency across different environments. Example: Create a Dockerfile For each microservice, create a Dockerfile that specifies the base image and dependencies and builds instructions. Dockerfile # Example Dockerfile for a Node.js microservice FROM node:14 WORKDIR /usr/src/app COPY package*.json ./ RUN npm install COPY . . EXPOSE 3000 CMD ["node", "app.js"] Build Docker Images Run docker build to create images for your microservices. PowerShell docker build -t user-management-service:v1 Push Images to a Registry Objective: Upload your Docker images to a container registry like Docker Hub or AWS Elastic Container Registry (ECR). Example: Tag Your Images Ensure your images are properly tagged with the registry's address. PowerShell docker tag user-management-service:v1 myregistry/user-management-service:v1 Push to Registry Upload your images to the chosen registry. PowerShell docker push myregistry/user-management-service:v1 Define Your Kubernetes Deployment Objective Create Kubernetes deployment files for each microservice to specify how they should be deployed and managed. Example YAML # Example Kubernetes deployment for a microservice apiVersion: apps/v1 kind: Deployment metadata: name: user-management-deployment spec: replicas: 3 selector: matchLabels: app: user-management template: metadata: labels: app: user-management spec: containers: - name: user-management image: myregistry/user-management-service:v1 ports: - containerPort: 3000 Deploy to Kubernetes Objective Use kubectl, the command-line tool for Kubernetes, to deploy your microservices to a Kubernetes cluster. Example: Apply Deployment Deploy your microservices using the deployment files created in the previous step. PowerShell kubectl apply -f user-management-deployment.yaml Verify Deployment Check the status of your deployment to ensure the pods are running correctly. PowerShell kubectl get deployments kubectl get pods Expose Your Microservices Objective Make your microservices accessible via a stable endpoint. Example: Create a Service Define a Kubernetes service for each microservice to expose it either internally within the cluster or externally. YAML apiVersion: v1 kind: Service metadata: name: user-management-service spec: type: LoadBalancer ports: - port: 80 targetPort: 3000 selector: app: user-management Apply the Service Deploy the service using kubectl. PowerShell kubectl apply -f user-management-service.yaml Implement Continuous Deployment Objective Automate the deployment process using CI/CD pipelines. Example Configure CI/CD Pipeline Use tools like Jenkins, GitHub Actions, or GitLab CI/CD to automate the testing, building, and deployment of your microservices to Kubernetes. Automate Updates Set up your pipeline to update the microservices in your Kubernetes cluster automatically whenever changes are pushed to your source control repository. Conclusion Building scalable systems with microservices requires careful planning, clear definition of service boundaries, and the selection of appropriate technologies. By following this step-by-step guide and leveraging the examples provided, teams can create a robust microservices architecture that improves scalability, facilitates independent development and deployment, and ultimately enhances the agility and resilience of the entire system.
In the dynamic landscape of microservices, managing communication and ensuring robust security and observability becomes a Herculean task. This is where Istio, a revolutionary service mesh, steps in, offering an elegant solution to these challenges. This article delves deep into the essence of Istio, illustrating its pivotal role in a Kubernetes (KIND) based environment, and guides you through a Helm-based installation process, ensuring a comprehensive understanding of Istio's capabilities and its impact on microservices architecture. Introduction to Istio Istio is an open-source service mesh that provides a uniform way to secure, connect, and monitor microservices. It simplifies configuration and management, offering powerful tools to handle traffic flows between services, enforce policies, and aggregate telemetry data, all without requiring changes to microservice code. Why Istio? In a microservices ecosystem, each service may be developed in different programming languages, have different versions, and require unique communication protocols. Istio provides a layer of infrastructure that abstracts these differences, enabling services to communicate with each other seamlessly. It introduces capabilities like: Traffic management: Advanced routing, load balancing, and fault injection Security: Robust ACLs, RBAC, and mutual TLS to ensure secure service-to-service communication Observability: Detailed metrics, logs, and traces for monitoring and troubleshooting Setting Up a KIND-Based Kubernetes Cluster Before diving into Istio, let's set up a Kubernetes cluster using KIND (Kubernetes IN Docker), a tool for running local Kubernetes clusters using Docker container "nodes." KIND is particularly suited for development and testing purposes. # Install KIND curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.11.1/kind-$(uname)-amd64 chmod +x ./kind mv ./kind /usr/local/bin/kind # Create a cluster kind create cluster --name istio-demo This code snippet installs KIND and creates a new Kubernetes cluster named istio-demo. Ensure Docker is installed and running on your machine before executing these commands. Helm-Based Installation of Istio Helm, the package manager for Kubernetes, simplifies the deployment of complex applications. We'll use Helm to install Istio on our KIND cluster. 1. Install Helm First, ensure Helm is installed on your system: curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 chmod 700 get_helm.sh ./get_helm.sh 2. Add the Istio Helm Repository Add the Istio release repository to Helm: helm repo add istio https://istio-release.storage.googleapis.com/charts helm repo update 3. Install Istio Using Helm Now, let's install the Istio base chart, the istiod service, and the Istio Ingress Gateway: # Install the Istio base chart helm install istio-base istio/base -n istio-system --create-namespace # Install the Istiod service helm install istiod istio/istiod -n istio-system --wait # Install the Istio Ingress Gateway helm install istio-ingress istio/gateway -n istio-system This sequence of commands sets up Istio on your Kubernetes cluster, creating a powerful platform for managing your microservices. To enable the Istio injection for the target namespace, use the following command. kubectl label namespace default istio-injection=enabled Exploring Istio's Features To demonstrate Istio's powerful capabilities in a microservices environment, let's use a practical example involving a Kubernetes cluster with Istio installed, and deploy a simple weather application. This application, running in a Docker container brainupgrade/weather-py, serves weather information. We'll illustrate how Istio can be utilized for traffic management, specifically demonstrating a canary release strategy, which is a method to roll out updates gradually to a small subset of users before rolling it out to the entire infrastructure. Step 1: Deploy the Weather Application First, let's deploy the initial version of our weather application using Kubernetes. We will deploy two versions of the application to simulate a canary release. Create a Kubernetes Deployment and Service for the weather application: apiVersion: apps/v1 kind: Deployment metadata: name: weather-v1 spec: replicas: 2 selector: matchLabels: app: weather version: v1 template: metadata: labels: app: weather version: v1 spec: containers: - name: weather image: brainupgrade/weather-py:v1 ports: - containerPort: 80 --- apiVersion: v1 kind: Service metadata: name: weather-service spec: ports: - port: 80 name: http selector: app: weather Apply this configuration with kubectl apply -f <file-name>.yaml. Step 2: Enable Traffic Management With Istio Now, let's use Istio to manage traffic to our weather application. We'll start by deploying a Gateway and a VirtualService to expose our application. apiVersion: networking.istio.io/v1alpha3 kind: Gateway metadata: name: weather-gateway spec: selector: istio: ingress servers: - port: number: 80 name: http protocol: HTTP hosts: - "*" --- apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: weather spec: hosts: - "*" gateways: - weather-gateway http: - route: - destination: host: weather-service port: number: 80 This setup routes all traffic through the Istio Ingress Gateway to our weather-service. Step 3: Implementing Canary Release Let's assume we have a new version (v2) of our weather application that we want to roll out gradually. We'll adjust our Istio VirtualService to route a small percentage of the traffic to the new version. 1. Deploy version 2 of the weather application: apiVersion: apps/v1 kind: Deployment metadata: name: weather-v2 spec: replicas: 1 selector: matchLabels: app: weather version: v2 template: metadata: labels: app: weather version: v2 spec: containers: - name: weather image: brainupgrade/weather-py:v2 ports: - containerPort: 80 2. Adjust the Istio VirtualService to split traffic between v1 and v2: apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: weather spec: hosts: - "*" gateways: - weather-gateway http: - match: - uri: prefix: "/" route: - destination: host: weather-service port: number: 80 subset: v1 weight: 90 - destination: host: weather-service port: number: 80 subset: v2 weight: 10 This configuration routes 90% of the traffic to version 1 of the application and 10% to version 2, implementing a basic canary release. Also, enable the DestinationRule as well. See the following: apiVersion: networking.istio.io/v1beta1 kind: DestinationRule metadata: name: weather-service namespace: default spec: host: weather-service subsets: - name: v1 labels: version: v1 - name: v2 labels: version: v2 This example illustrates how Istio enables sophisticated traffic management strategies like canary releases in a microservices environment. By leveraging Istio, developers can ensure that new versions of their applications are gradually and safely exposed to users, minimizing the risk of introducing issues. Istio's service mesh architecture provides a powerful toolset for managing microservices, enhancing both the reliability and flexibility of application deployments. Istio and Kubernetes Services Istio and Kubernetes Services are both crucial components in the cloud-native ecosystem, but they serve different purposes and operate at different layers of the stack. Understanding how Istio differs from Kubernetes Services is essential for architects and developers looking to build robust, scalable, and secure microservices architectures. Kubernetes Services Kubernetes Services are a fundamental part of Kubernetes, providing an abstract way to expose an application running on a set of Pods as a network service. With Kubernetes Services, you can utilize the following: Discoverability: Assign a stable IP address and DNS name to a group of Pods, making them discoverable within the cluster. Load balancing: Distribute network traffic or requests among the Pods that constitute a service, improving application scalability and availability. Abstraction: Decouple the front-end service from the back-end workloads, allowing back-end Pods to be replaced or scaled without reconfiguring the front-end clients. Kubernetes Services focuses on internal cluster communication, load balancing, and service discovery. They operate at the L4 (TCP/UDP) layer, primarily dealing with IP addresses and ports. Istio Services Istio, on the other hand, extends the capabilities of Kubernetes Services by providing a comprehensive service mesh that operates at a higher level. It is designed to manage, secure, and observe microservices interactions across different environments. Istio's features include: Advanced traffic management: Beyond simple load balancing, Istio offers fine-grained control over traffic with rich routing rules, retries, failovers, and fault injection. It operates at L7 (HTTP/HTTPS/GRPC), allowing behavior to be controlled based on HTTP headers and URLs. Security: Istio provides end-to-end security, including strong identity-based authentication and authorization between services, transparently encrypting communication with mutual TLS, without requiring changes to application code. Observability: It offers detailed insights into the behavior of the microservices, including automatic metrics, logs, and traces for all traffic within a cluster, regardless of the service language or framework. Policy enforcement: Istio allows administrators to enforce policies across the service mesh, ensuring compliance with security, auditing, and operational policies. Key Differences Scope and Layer Kubernetes Services operates at the infrastructure layer, focusing on L4 (TCP/UDP) for service discovery and load balancing. Istio operates at the application layer, providing L7 (HTTP/HTTPS/GRPC) traffic management, security, and observability features. Capabilities While Kubernetes Services provides basic load balancing and service discovery, Istio offers advanced traffic management (like canary deployments and circuit breakers), secure service-to-service communication (with mutual TLS), and detailed observability (tracing, monitoring, and logging). Implementation and Overhead Kubernetes Services are integral to Kubernetes and require no additional installation. Istio, being a service mesh, is an add-on layer that introduces additional components (like Envoy sidecar proxies) into the application pods, which can add overhead but also provide enhanced control and visibility. Kubernetes Services and Istio complement each other in the cloud-native ecosystem. Kubernetes Services provides the basic necessary functionality for service discovery and load balancing within a Kubernetes cluster. Istio extends these capabilities, adding advanced traffic management, enhanced security features, and observability into microservices communications. For applications requiring fine-grained control over traffic, secure communication, and deep observability, integrating Istio with Kubernetes offers a powerful platform for managing complex microservices architectures. Conclusion Istio stands out as a transformative force in the realm of microservices, providing a comprehensive toolkit for managing the complexities of service-to-service communication in a cloud-native environment. By leveraging Istio, developers and architects can significantly streamline their operational processes, ensuring a robust, secure, and observable microservices architecture. Incorporating Istio into your microservices strategy not only simplifies operational challenges but also paves the way for innovative service management techniques. As we continue to explore and harness the capabilities of service meshes like Istio, the future of microservices looks promising, characterized by enhanced efficiency, security, and scalability.
So far in our series on modern microservices, we have built: A simple gRPC service Added a REST/HTTP interface exposing gRPC service RESTfully and showing a glimpse of the gRPC plugin universe Introduced Buf.build to simplify plugin management We are far from productionalizing our service. A production-ready service would (at the very least) need several things: Authentication/Authorization Request logging Request tracing Caching Rate limiting Load balancing And more (security, multi-zone/multi-regional services, etc.) A common thread across all these aspects is that these apply in an (almost) uniform way to all (service) requests (without the operation being aware of it); e.g., given a request handler function (more on this later): Request logging would be printing out common metrics (like response times, error traces, etc.) after calling the handler. Rate limiting can also be applied in a uniform way (by looking up a config of request-specific limits) and only invoking the handler if within those limits. Authentication can look for common request headers (if HTTP) before allowing continuing onto the request handler. In this post, we will describe interceptors: a power gRPC facility for (well) intercepting and modifying requests and responses (and streams) on a gRPC server. If you want to jump right into the code, you can find it here. Middleware Before going into gRPC interceptors, let us look at their parallel in the HTTP world - the middleware! In a typical HTTP endpoint (API or otherwise), middleware is used extensively to wrap/decorate/filter requests. The role of middleware is to: Intercept a request from a handler Reject, modify, or forward the request as is to the underlying handler Intercept (the forwarded) request's response, and then modify/forward back to the caller Request handlers are typically functions that (req: HTTPRequest) => HTTPResponse (in your favorite language/platform (™)). Naturally, middleware can also be thought of as "decorator" functions that return other handler functions, e.g.: function mymiddleware(anotherHandler: HTTPHandler): HTTPHandler { newHandler = function(req: HTTPRequest): HTTPResponse { req = // do some preprocessing and get a modified request resp = anotherHandler(req) resp = // do some post processing and get a modified response return resp } return newHandler // Return the new handler function } So now we could create a very simple rate-limiter (say, for a 5-minute window) with the following: function rateLimitingMiddleware(originalHandler: HTTPHandler): HTTPHandler { return function (req: HTTPRequest): HTTPResponse { method = req.method path = req.path rate_config = getRateLimitConfig(method, path) // (5 minutes in seconds) ourWindow = 5 * 60 num_requests = getNumRequestsInWindow(method, path, ourWindow) if (num_requests > rate_config.limit) { return HTTPResponse(429, 'Too many requests') } return originalHandler(req) } } The advantage of middleware is that it can be chained to apply separate concerns without the knowledge of the main request handling the business logic. Thus, a common pattern (say, in a language like Python that supports decorators) would look like: @ratelimiter_middleware @authenticator_middleware @logger_middleware def main_handler(req: HTTPRequest) -> HTTPResponse: return 200, "Hello World" Without any syntactical decorator support, this could be achieved with: func createHttpserver() { ... ... widgetHandler := func(w http.ResponseWriter, r *http.Request) { // return a widget listing } mux := http.NewServeMux() mux.Handle("/api/widgets/", authMiddleware( loggerMiddleware( rateLimitingMiddlewarea( widgetHandler)))) http.ListenAndServe("localhost:8080", mux) ... ... } There are fancier things one can do, like apply middleware en-masse to an entire collection of routes. Such framework-specific aesthetics are outside the scope of this post. If you are interested to learn more check out the amazing Gin Web Framework! Interceptors Now that we have seen their HTTP equivalent, interceptors (in gRPC) are very intuitive. Interceptors are also a way to decorate requests and responses in gRPC. However, they come in two standardized specialized flavors: Unary interceptors: These are "one-shot" interceptors. They either intercept a request or a response, once in the request/response's lifecycle. Stream interceptors: These are "continuous." They intercept every message in a streaming request (client -> server) or a streaming response (server -> client). Since interceptors can apply to both the client and server, we have four total favors: Client Unary Interceptor - Client Unary Interceptors are for intercepting a request just as it leaves the client, but before it is sent to the server. A typical use case for these could be for a client that may look up a local (on-client) cache for queries instead of forwarding to a server. Another example is for client-side routing, where the client may decide which server-shard to forward a request to based on the entity's ID. Other cases could be to log/monitor client-side latencies of requests, etc. Server Unary Interceptor - These intercept a request that is received by a server (but before forwarding to the request handler). Server-side interceptors are great for common validation of auth tokens or logging/monitoring server-side latencies and errors and more. Client Stream Interceptor - Similar to their Unary counterpart, these intercept and process/transform each message being streamed from the client to the server. A great use case for this could be an interceptor/agent that may collect multiple messages and collect them in a window before forwarding them to the server (e.g., logs or metrics). Server Stream Interceptor - Similar to their Unary counterpart, these intercept messages in a single connection when received at the server. Interceptors provide more benefits than plain HTTP middleware: HTTP middleware is very language/framework specific so each framework has its own conventions for creating/enforcing this. HTTP middleware has no standard ways to decorate streams (e.g., WebSocket packets). Since gRPC offers framing in streaming messages, stream interceptors can intercept individual messages in a stream. In HTTP (or WebSockets), the lack of a "typed message" stream means applications would have to implement their own framing of messages and decorator "schemas" to process these messages in arbitrary ways. Implementing Interceptors Our example does not (yet) have any streaming RPCs. We will only add unary interceptors for now, and add stream interceptors when we look at a future post on WebSockets and streaming. First, we will add a Client Unary Interceptor to our service clients (invoked by the gRPC Gateway) to ensure that only requests that contain the auth header (with username + password) are forwarded to the server. Otherwise, the call to the server is not even made (and a 403 is returned). Then, we will add a Server Unary Interceptor to our service to accept and validate these credentials (after all - the server cannot just accept whatever the client sends at face value): Support basic HTTP auth in the gRPC Gateway so that the caller of our API can pass in a username/password to authenticate a user. The gRPC Gateway (HTTP server) extracts the username/password (from HTTP headers) and forwards it to the service (via gRPC metadata - see below). The Server Unary Interceptor validates this username/password against a static list of users/passwords. If the credentials are invalid, then the interceptor returns an error to the gRPC gateway (without invoking the gRPC handler). If the credentials are valid, the underlying service's handler is invoked. Clearly, this auth scheme is very simplistic and we will look at more full-fledged and complex examples in a future post on authentication. Now let us look at the implementation of each of these. Step 1: Extract Username/Password From HTTP Request Headers Our startGatewayServer method simply starts an HTTP server forwarding requests to the underlying gRPC service. Here, we also introduced the NewServeMux method in the grpc-gateway/v2/runtime module as a better replacement for the standard library's NewServeMux method due to its close understanding of the gRPC environment. Thus, the first step for us is to extract the auth-related HTTP headers from the incoming HTTP request and add them to the metadata that will be sent to the gRPC service. You can think of the metadata as the headers equivalent in the gRPC environment. These are simply key/value pairs. This is done below (in cmd/main.go): import ( ... ... // Add Imports "strings" "google.golang.org/grpc/codes" "google.golang.org/grpc/metadata" "google.golang.org/grpc/status" ... ... ) ... ... func startGatewayServer(grpc_addr string, gw_addr string) { ctx := context.Background() // // Step 1 - Add extra options to NewServeMux // mux := runtime.NewServeMux( runtime.WithMetadata(func (ctx context.Context, request *http.Request) metadata.MD { // // Step 2 - Extend the context // ctx = metadata.AppendToOutgoingContext(ctx) // // Step 3 - get the basic auth params // if username, password, ok := request.BasicAuth(); ok { md := metadata.Pairs() md.Append("OneHubUsername", username) md.Append("OneHubPassword", password) return md } else { return nil } })) opts := []grpc.DialOption{grpc.WithInsecure()} ... ... ... } The additions are pretty minimal: We modify NewServeMux to include our first ServeMuxOption function (middleware). This ServeMuxOption function extracts username/password basic auth params from the headers. If the basic auth params are found they are wrapped as 2 metadata pairs and returned (to be passed to the service). Step 2: Ensure Auth Params in Client Originating Metadata Here is our first Client Unary Interceptor, which, before forwarding a request to the gRPC service, will ensure that the OneHubUsername and OneHubPassword metadata pairs are set. Why even send an unauthenticated request to the service to begin with? Going back to our startGatewayServer method: once we are past ServeMux, it is time to configure our DialOptions. gprc.DialOption simply configures how a connection is to be made to the service. In our example so far, we just specified that we would like to configure our connection over an insecure transport (in a secure environment, the clients would also be issued certificates, etc. for authentication). A client interceptor can be added as an additional DialOption! That is it. A unary client interceptor is just a function with the following signature: type UnaryClientInterceptor func(ctx context.Context, method string, // Method to be invoked on the service (eg GetTopics) req, // Request payload (eg GetTopicsRequest) reply interface{}, // Response payload (eg GetTopicsResponse) cc *ClientConn, // the underlying connection to the service invoker UnaryInvoker, // The next handler opts ...CallOption) error The signature is hopefully self-explanatory. The key parameter is the invoker which is the "next" handler that must be called by the interceptor if the chain is to be continued. The interceptor can choose to not call the invoker and instead return an error or a custom response (or error). Our client interceptor is simple. It will call the invoker if a username/password are present; otherwise, it will throw an error: func EnsureAuthExists(ctx context.Context, method string, // Method to be invoked on the service (eg GetTopics) req, // Request payload (eg GetTopicsRequest) reply interface{}, // Response payload (eg GetTopicsResponse) cc *grpc.ClientConn, // the underlying connection to the service invoker grpc.UnaryInvoker, // The next handler opts ...grpc.CallOption) error { md, ok := metadata.FromOutgoingContext(ctx) if ok { usernames := md.Get("OneHubUsername") passwords := md.Get("OneHubPassword") if len(usernames) > 0 && len(passwords) > 0 { username := strings.TrimSpace(usernames[0]) password := strings.TrimSpace(passwords[0]) if len(username) > 0 && len(password) > 0 { // All fine - just call the invoker return invoker(ctx, method, req, reply, cc, opts...) } } } return status.Error(codes.NotFound, "BasicAuth params not found") } Note that metadata entries are really key/value-list pairs (much like headers or query-params in HTTP). Now all that is left is to add our Interceptor to our DialOptions in the client: func startGatewayServer(grpc_addr string, gw_addr string) { mux := .... opts := []grpc.DialOption{ grpc.WithInsecure(), // Add our interceptor as a DialOption grpc.WithUnaryInterceptor(EnsureAuthExists), } ... ... ... } grpc.WithUnaryInterceptor takes a Unary Client Interceptor function and turns it into a DialOption. That's it! Now start the server again (go cmd/server.go) and let us test calls to our chat service and see how this works. First, let us try an unauthenticated call: $ curl localhost:8080/v1/topics {"code":5,"message":"BasicAuth params not found","details":[]} As expected, the call without basic auth headers was intercepted and rejected. Now let us try with a username/password: $ curl localhost:8080/v1/topics -u login:password {"topics":[], "nextPageKey":""} Lo and behold: our request from the client was served by the server - though the request was not authenticated by the server. One thing to observe in the above examples is how the metadata object is created. It is created from the context. Specifically, it is created from the "outgoing" context. There are 2 contexts associated: the incoming and outgoing context for responses and requests, respectively. The meanings of incoming and outgoing are reversed on the server side as the request is incoming and the response is outgoing. Step 3: Add Server-Side Authentication While it is commendable that the client ensured the presence of BasicAuth credentials, it is up to the server to validate them. To do this, we will add (as you guessed) a UnaryServerInterceptor, which is a function with the signature: type UnaryServerInterceptor func( ctx context.Context, req interface{}, info *UnaryServerInfo, handler UnaryHandler ) (resp interface{}, err error) This looks very similar to a UnaryClientInterceptor. The important parameters here are: info - Contains RPC-related information the interceptor can use and operate on handler - A wrapper over the service method implementation that is to be called by the interceptor (if the chain is to be continued) For our server-side auth, we shall add a basic interceptor: func EnsureAuthIsValid(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (resp interface{}, err error) { md, ok := metadata.FromIncomingContext(ctx) if ok { usernames := md.Get("OneHubUsername") passwords := md.Get("OneHubPassword") if len(usernames) > 0 && len(passwords) > 0 { username := strings.TrimSpace(usernames[0]) password := strings.TrimSpace(passwords[0]) // Make sure you use better passwords than this! if len(username) > 0 && password == fmt.Sprintf("%s123", username) { // All fine - just call the invoker return handler(ctx, req) } } } return nil, status.Error(codes.NotFound, "Invalid username/password") } This is very similar to our client interceptor. Get the metadata from the Incoming context (recall that on the client side, this was from the Outgoing context). Ensure password == username + "123" (needless to say, we could do better here). If passwords match, continue on. Otherwise, return an error. We have one final step left: activating it. This is very similar to activating our client interceptor. The client interceptor was activated by passing our interceptor as a DialOption. The server interceptor will be passed a ServerOption to the NewServer method in the startGRPCEndpoints function: func startGRPCServer(addr string) { // create new gRPC server server := grpc.NewServer( grpc.UnaryInterceptor(EnsureAuthIsValid), ) ... ... Let us test it again now. Passing the last login:password combo while valid from the client should now get rejected by the server (note the different error messages): $ curl localhost:8080/v1/topics -u login:password {"code":5, "message":"Invalid username/password", "details":[]} Passing the right password fixes this: $ curl localhost:8080/v1/topics -u login:login123 {"topics":[], "nextPageKey":""} Step 4: Use Metadata Up until now, our service methods have been shielded so that they won't even be called if an auth param was not passed or was invalid (albeit with a simple check for a "123" prefix). Sometimes it is necessary for the service methods to obtain and use this information. For example, when an entity is created, the service may want to enforce that the "creator" is set to the logged-in/authenticated user instead of an arbitrary value passed by the caller. This is quite simple. Let us take the CreateTopic method: func (s *TopicService) CreateTopic(ctx context.Context, req *protos.CreateTopicRequest) (resp *protos.CreateTopicResponse, err error) { resp = &protos.CreateTopicResponse{} resp.Topic = s.EntityStore.Create(req.Topic) return } It can now use the auth info passed in via the interceptors: func (s *TopicService) CreateTopic(ctx context.Context, req *protos.CreateTopicRequest) (resp *protos.CreateTopicResponse, err error) { resp = &protos.CreateTopicResponse{} req.Topic.CreatorId = GetAuthedUser(ctx) if req.Topic.CreatorId == "" { return nil, status.Error(codes.PermissionDenied, "User is not authenticated to create a topic") } resp.Topic = s.EntityStore.Create(req.Topic) return } If we try to create a topic, any custom will be overwritten by the ID of the logged-in user: curl -X POST localhost:8080/v1/topics \ -u auser:auser123 \ -H 'Content-Type: application/json' \ -d '{"topic": {"name": "First Topic", "creator_id": "user1"}' | json_pp Yielding: { "topic" : { "createdAt" : "2023-08-04T08:52:52.861406Z", "creatorId" : "auser", "id" : "1", "name" : "First Topic", "updatedAt" : "2023-08-04T08:52:52.861407Z", "users" : [] } } That's it. That's all there is to interceptors. Stream interceptors are very similar, but we won't cover them here just yet. Wait for it, though! Conclusion By using interceptors a service can be wrapped/decorated with a lot of common/cross-cutting capabilities in a way transparent to the underlying service (and method handlers). This allows separation of concerns as well the ability to plug/play/replace these common behaviors with other providers in the future. Some of the interesting things that can be done with interceptors are to enable logging, request tracing, authentication, rate-limiting, load balancing, and much more. To summarize, In this article: We contrasted HTTP middleware and gRPC interceptors. Touched upon the versatility of interceptors in providing a wide variety of functionality Implemented unary interceptors to decorate requests both on the client as well as the server side to provide a simple authentication mechanism. In the next post, we will finally start persisting our data in a real database. We will also containerize our whole setup and environment for easy development, portability, and packaging. This will also pave the way for keeping development/startup simple as we add more services for different extensions on our canonical chat service!
This article presents an in-depth analysis of the service mesh landscape, focusing specifically on Istio, one of the most popular service mesh frameworks. A service mesh is a dedicated infrastructure layer for managing service-to-service communication in the world of microservices. Istio, built to seamlessly integrate with platforms like Kubernetes, provides a robust way to connect, secure, control, and observe services. This journal explores Istio’s architecture, its key features, and the value it provides in managing microservices at scale. Service Mesh A Kubernetes service mesh is a tool that improves the security, monitoring, and reliability of applications on Kubernetes. It manages communication between microservices and simplifies the complex network environment. By deploying network proxies alongside application code, the service mesh controls the data plane. This combination of Kubernetes and service mesh is particularly beneficial for cloud-native applications with many services and instances. The service mesh ensures reliable and secure communication, allowing developers to focus on core application development. A Kubernetes service mesh, like any service mesh, simplifies how distributed applications communicate with each other. It acts as a layer of infrastructure that manages and controls this communication, abstracting away the complexity from individual services. Just like a tracking and routing service for packages, a Kubernetes service mesh tracks and directs traffic based on rules to ensure reliable and efficient communication between services. A service mesh consists of a data plane and a control plane. The data plane includes lightweight proxies deployed alongside application code, handling the actual service-to-service communication. The control plane configures these proxies, manages policies, and provides additional capabilities such as tracing and metrics collection. With a Kubernetes service mesh, developers can separate their application's logic from the infrastructure that handles security and observability, enabling secure and monitored communication between microservices. It also supports advanced deployment strategies and integrates with monitoring tools for better operational control. Istio as a Service Mesh Istio is a popular open-source service mesh that has gained significant adoption among major tech companies like Google, IBM, and Lyft. It leverages the data plane and control plane architecture common to all service meshes, with its data plane consisting of envoy proxies deployed as sidecars within Kubernetes pods. The data plane in Istio is responsible for managing traffic, implementing fault injection for specific protocols, and providing application layer load balancing. This application layer load balancing differs from the transport layer load balancing in Kubernetes. Additionally, Istio includes components for collecting metrics, enforcing access control, authentication, and authorization, as well as integrating with monitoring and logging systems. It also supports encryption, authentication policies, and role-based access control through features like TLS authentication. Find the Istio architecture diagram below: Below, find the configuration and data flow diagram of Istio: Furthermore, Istio can be extended with various tools to enhance its functionality and integrate with other systems. This allows users to customize and expand the capabilities of their Istio service mesh based on their specific requirements. Traffic Management Istio offers traffic routing features that have a significant impact on performance and facilitate effective deployment strategies. These features allow precise control over the flow of traffic and API calls within a single cluster and across clusters. Within a single cluster, Istio's traffic routing rules enable efficient distribution of requests between services based on factors like load balancing algorithms, service versions, or user-defined rules. This ensures optimal performance by evenly distributing requests and dynamically adjusting routing based on service health and availability. Routing traffic across clusters enhances scalability and fault tolerance. Istio provides configuration options for traffic routing across clusters, including round-robin, least connections, or custom rules. This capability allows traffic to be directed to different clusters based on factors such as network proximity, resource utilization, or specific business requirements. In addition to performance optimization, Istio's traffic routing rules support advanced deployment strategies. A/B testing enables the routing of a certain percentage of traffic to a new service version while serving the majority of traffic to the existing version. Canary deployments involve gradually shifting traffic from an old version to a new version, allowing for monitoring and potential rollbacks. Staged rollouts incrementally increase traffic to a new version, enabling precise control and monitoring of the deployment process. Furthermore, Istio simplifies the configuration of service-level properties like circuit breakers, timeouts, and retries. Circuit breakers prevent cascading failures by redirecting traffic when a specified error threshold is reached. Timeouts and retries handle network delays or transient failures by defining response waiting times and the number of request retries. In summary, Istio's traffic routing capabilities provide a flexible and powerful means to control traffic and API calls, improving performance and facilitating advanced deployment strategies such as A/B testing, canary deployments, and staged rollouts. The following is a code sample that demonstrates how to use Istio's traffic routing features in Kubernetes using Istio VirtualService and DestinationRule resources: In the code below, we define a VirtualService named my-service with a host my-service.example.com. We configure traffic routing by specifying two routes: one to the v1 subset of the my-service destination and another to the v2 subset. We assign different weights to each route to control the proportion of traffic they receive. The DestinationRule resource defines subsets for the my-service destination, allowing us to route traffic to different versions of the service based on labels. In this example, we have subsets for versions v1 and v2. Code Sample YAML # Example VirtualService configuration apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: my-service spec: hosts: - my-service.example.com http: - route: - destination: host: my-service subset: v1 weight: 90 - destination: host: my-service subset: v2 weight: 10 # Example DestinationRule configuration apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: my-service spec: host: my-service subsets: - name: v1 labels: version: v1 - name: v2 labels: version: v2 Observability As the complexity of services grows, it becomes increasingly challenging to comprehend their behavior and performance. Istio addresses this challenge by automatically generating detailed telemetry for all communications within a service mesh. This telemetry includes metrics, distributed traces, and access logs, providing comprehensive observability into the behavior of services. With Istio, operators can easily access and analyze metrics that capture various aspects of service performance, such as request rates, latency, and error rates. These metrics offer valuable insights into the health and efficiency of services, allowing operators to proactively identify and address performance issues. Distributed tracing in Istio enables the capturing and correlation of trace spans across multiple services involved in a request. This provides a holistic view of the entire request flow, allowing operators to understand the latency and dependencies between services. With this information, operators can pinpoint bottlenecks and optimize the performance of their applications. Full access logs provided by Istio capture detailed information about each request, including headers, payloads, and response codes. These logs offer a comprehensive audit trail of service interactions, enabling operators to investigate issues, debug problems, and ensure compliance with security and regulatory requirements. The telemetry generated by Istio is instrumental in empowering operators to troubleshoot, maintain, and optimize their applications. It provides a deep understanding of how services interact, allowing operators to make data-driven decisions and take proactive measures to improve performance and reliability. Furthermore, Istio's telemetry capabilities are seamlessly integrated into the service mesh without requiring any modifications to the application code, making it a powerful and convenient tool for observability. Istio automatically generates telemetry for all communications within a service mesh, including metrics, distributed traces, and access logs. Here's an example of how you can access metrics and logs using Istio: Commands in Bash # Access metrics: istioctl dashboard kiali # Access distributed traces: istioctl dashboard jaeger # Access access logs: kubectl logs -l istio=ingressgateway -n istio-system In the code above, we use the istioctl command-line tool to access Istio's observability dashboards. The istioctl dashboard kiali command opens the Kiali dashboard, which provides a visual representation of the service mesh and allows you to view metrics such as request rates, latency, and error rates. The istioctl dashboard jaeger command opens the Jaeger dashboard, which allows you to view distributed traces and analyze the latency and dependencies between services. To access access logs, we use the kubectl logs command to retrieve logs from the Istio Ingress Gateway. By filtering logs with the label istio=ingressgateway and specifying the namespace istio-system, we can view detailed information about each request, including headers, payloads, and response codes. By leveraging these observability features provided by Istio, operators can gain deep insights into the behavior and performance of their services. This allows them to troubleshoot issues, optimize performance, and ensure the reliability of their applications. Security Capabilities Microservices have specific security requirements, such as protecting against man-in-the-middle attacks, implementing flexible access controls, and enabling auditing tools. Istio addresses these needs with its comprehensive security solution. Istio's security model follows a "security-by-default" approach, providing in-depth defense for deploying secure applications across untrusted networks. It ensures strong identity management, authenticating and authorizing services within the service mesh to prevent unauthorized access and enhance security. Transparent TLS encryption is a crucial component of Istio's security framework. It encrypts all communication within the service mesh, safeguarding data from eavesdropping and tampering. Istio manages certificate rotation automatically, simplifying the maintenance of a secure communication channel between services. Istio also offers powerful policy enforcement capabilities, allowing operators to define fine-grained access controls and policies for service communication. These policies can be dynamically enforced and updated without modifying the application code, providing flexibility in managing access and ensuring secure communication. With Istio, operators have access to authentication, authorization, and audit (AAA) tools. Istio supports various authentication mechanisms, including mutual TLS, JSON Web Tokens (JWT), and OAuth2, ensuring secure authentication of clients and services. Additionally, comprehensive auditing capabilities help operators track service behavior, comply with regulations, and detect potential security incidents. In summary, Istio's security solution addresses the specific security requirements of microservices, providing strong identity management, transparent TLS encryption, policy enforcement, and AAA tools. It enables operators to deploy secure applications and protect services and data within the service mesh. Code Sample YAML # Example DestinationRule for mutual TLS authentication apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: my-service spec: host: my-service trafficPolicy: tls: mode: MUTUAL clientCertificate: /etc/certs/client.pem privateKey: /etc/certs/private.key caCertificates: /etc/certs/ca.pem # Example AuthorizationPolicy for access control apiVersion: security.istio.io/v1beta1 kind: AuthorizationPolicy metadata: name: my-service-access spec: selector: matchLabels: app: my-service rules: - from: - source: principals: ["cluster.local/ns/default/sa/my-allowed-service-account"] to: - operation: methods: ["*"] In the code above, we configure mutual TLS authentication for the my-service destination using a DestinationRule resource. We set the mode to MUTUAL to enforce mutual TLS authentication between clients and the service. The clientCertificate, privateKey, and caCertificates fields specify the paths to the client certificate, private key, and CA certificate, respectively. We also define an AuthorizationPolicy resource to control access to the my-service based on the source service account. In this example, we allow requests from the my-allowed-service-account service account in the default namespace by specifying its principal in the principals field. By applying these configurations to an Istio-enabled Kubernetes cluster, you can enhance the security of your microservices by enforcing mutual TLS authentication and implementing fine-grained access controls. Circuit Breaking and Retry Circuit breaking and retries are crucial techniques in building resilient distributed systems, especially in microservices architectures. Circuit breaking prevents cascading failures by stopping requests to a service experiencing errors or high latency. Istio's CircuitBreaker resource allows you to define thresholds for failed requests and other error conditions, ensuring that the circuit opens and stops further degradation when these thresholds are crossed. This isolation protects other services from being affected. Additionally, Istio's Retry resource enables automatic retries of failed requests, with customizable backoff strategies, timeout periods, and triggering conditions. By retrying failed requests, transient failures can be handled effectively, increasing the chances of success. Combining circuit breaking and retries enhances the resilience of microservices, isolating failing services and providing resilient handling of intermittent issues. Configuration of circuit breaking and retries in Istio is done within the VirtualService resource, allowing for customization based on specific requirements. Overall, leveraging these features in Istio is essential for building robust and resilient microservices architectures, protecting against failures, and maintaining system reliability. In the code below, we configure circuit breaking and retries for my-service using the VirtualService resource. The retries section specifies that failed requests should be retried up to 3 times with a per-try timeout of 2 seconds. The retryOn field specifies the conditions under which retries should be triggered, such as 5xx server errors or connect failures. The fault section configures fault injection for the service. In this example, we introduce a fixed delay of 5 seconds for 50% of the requests and abort 10% of the requests with a 503 HTTP status code. The circuitBreaker section defines the circuit-breaking thresholds for the service. The example configuration sets the maximum number of connections to 100, maximum HTTP requests to 100, maximum pending requests to 10, sleep window to 5 seconds, and HTTP detection interval to 10 seconds. By applying this configuration to an Istio-enabled Kubernetes cluster, you can enable circuit breaking and retries for your microservices, enhancing resilience and preventing cascading failures. Code Sample YAML apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: my-service spec: hosts: - my-service http: - route: - destination: host: my-service subset: v1 retries: attempts: 3 perTryTimeout: 2s retryOn: 5xx,connect-failure fault: delay: fixedDelay: 5s percentage: value: 50 abort: httpStatus: 503 percentage: value: 10 circuitBreaker: simpleCb: maxConnections: 100 httpMaxRequests: 100 httpMaxPendingRequests: 10 sleepWindow: 5s httpDetectionInterval: 10s Canary Deployments Canary deployments with Istio offer a powerful strategy for releasing new features or updates to a subset of users or traffic while minimizing the risk of impacting the entire system. With Istio's traffic management capabilities, you can easily implement canary deployments by directing a fraction of the traffic to the new version or feature. Istio's VirtualService resource allows you to define routing rules based on percentages, HTTP headers, or other criteria to selectively route traffic. By gradually increasing the traffic to the canary version, you can monitor its performance and gather feedback before rolling it out to the entire user base. Istio also provides powerful observability features, such as distributed tracing and metrics collection, allowing you to closely monitor the canary deployment and make data-driven decisions. In case of any issues or anomalies, you can quickly roll back to the stable version or implement other remediation strategies, minimizing the impact on users. Canary deployments with Istio provide a controlled and gradual approach to releasing new features, ensuring that changes are thoroughly tested and validated before impacting the entire system, thus improving the overall reliability and stability of your applications. To implement canary deployments with Istio, we can use the VirtualService resource to define routing rules and gradually shift traffic to the canary version. Code Sample YAML apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: my-service spec: hosts: - my-service http: - route: - destination: host: my-service subset: stable weight: 90 - destination: host: my-service subset: canary weight: 10 In the code above, we configure the VirtualService to route 90% of the traffic to the stable version of the service (subset: stable) and 10% of the traffic to the canary version (subset: canary). The weight field specifies the distribution of traffic between the subsets. By applying this configuration, you can gradually increase the traffic to the canary version and monitor its behavior and performance. Istio's observability features, such as distributed tracing and metrics collection, can provide insights into the canary deployment's behavior and impact. If any issues or anomalies are detected, you can quickly roll back to the stable version by adjusting the traffic weights or implementing other remediation strategies. By leveraging Istio's traffic management capabilities, you can safely release new features or updates, gather feedback, and mitigate risks before fully rolling them out to your user base. Autoscaling Istio seamlessly integrates with Kubernetes' Horizontal Pod Autoscaler (HPA) to enable automated scaling of microservices based on various metrics, such as CPU or memory usage. By configuring Istio's metrics collection and setting up the HPA, you can ensure that your microservices scale dynamically in response to increased traffic or resource demands. Istio's metrics collection capabilities allow you to gather detailed insights into the performance and resource utilization of your microservices. These metrics can then be used by the HPA to make informed scaling decisions. The HPA continuously monitors the metrics and adjusts the number of replicas for a given microservice based on predefined scaling rules and thresholds. When the defined thresholds are crossed, the HPA automatically scales up or down the number of pods, ensuring that the microservices can handle the current workload efficiently. This automated scaling approach eliminates the need for manual intervention and enables your microservices to adapt to fluctuating traffic patterns or resource demands in real time. By leveraging Istio's integration with Kubernetes' HPA, you can achieve optimal resource utilization, improve performance, and ensure the availability and scalability of your microservices. Code Sample YAML apiVersion: autoscaling/v2beta2 kind: HorizontalPodAutoscaler metadata: name: my-service-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: my-service minReplicas: 1 maxReplicas: 10 metrics: - type: Resource resource: name: cpu targetAverageUtilization: 50 In the example above, the HPA is configured to scale the my-service deployment based on CPU usage. The HPA will maintain an average CPU utilization of 50% across all pods. By applying this configuration, Istio will collect metrics from your microservices, and the HPA will automatically adjust the number of replicas based on the defined scaling rules and thresholds. With this integration, your microservices can dynamically scale up or down based on traffic patterns and resource demands, ensuring optimal utilization of resources and improved performance. It’s important to note that the Istio integration with Kubernetes' HPA may require additional configuration and tuning based on your specific requirements and monitoring setup. Implementing Fault Injection and Chaos Testing With Istio Chaos fault injection with Istio is a powerful technique that allows you to test the resilience and robustness of your microservices architecture. Istio provides built-in features for injecting faults and failures into your system, simulating real-world scenarios, and evaluating how well your system can handle them. With Istio's Fault Injection feature, you can introduce delays, errors, aborts, or latency spikes to specific requests or services. By configuring VirtualServices and DestinationRules, you can selectively apply fault injection based on criteria such as HTTP headers or paths. By combining fault injection with observability features like distributed tracing and metrics collection, you can closely monitor the impact of injected faults on different services in real time. Chaos fault injection with Istio helps you identify weaknesses, validate error handling mechanisms, and build confidence in the resilience of your microservices architecture, ensuring the reliability and stability of your applications in production environments. Securing External Traffic Using Istio's Ingress Gateway Securing external traffic using Istio's Ingress Gateway is crucial for protecting your microservices architecture from unauthorized access and potential security threats. Istio's Ingress Gateway acts as the entry point for external traffic, providing a centralized and secure way to manage inbound connections. By configuring Istio's Ingress Gateway, you can enforce authentication, authorization, and encryption protocols to ensure that only authenticated and authorized traffic can access your microservices. Istio supports various authentication mechanisms such as JSON Web Tokens (JWT), mutual TLS (mTLS), and OAuth, allowing you to choose the most suitable method for your application's security requirements. Additionally, Istio's Ingress Gateway enables you to define fine-grained access control policies based on source IP, user identity, or other attributes, ensuring that only authorized clients can reach specific microservices. By leveraging Istio's powerful traffic management capabilities, you can also enforce secure communication between microservices within your architecture, preventing unauthorized access or eavesdropping. Overall, Istio's Ingress Gateway provides a robust and flexible solution for securing external traffic, protecting your microservices, and ensuring the integrity and confidentiality of your data and communications. Code Sample YAML apiVersion: networking.istio.io/v1alpha3 kind: Gateway metadata: name: my-gateway spec: selector: istio: ingressgateway servers: - port: number: 80 name: http protocol: HTTP hosts: - "*" In this example, we define a Gateway named my-gateway that listens on port 80 and accepts HTTP traffic from any host. The Gateway's selector is set to istio: ingressgateway, which ensures that it will be used as the Ingress Gateway for external traffic. Best Practices for Managing and Operating Istio in Production Environments When managing and operating Istio in production environments, there are several best practices to follow. First, it is essential to carefully plan and test your Istio deployment before production rollout, ensuring compatibility with your specific application requirements and infrastructure. Properly monitor and observe your Istio deployment using Istio's built-in observability features, including distributed tracing, metrics, and logging. Regularly review and update Istio configurations to align with your evolving application needs and security requirements. Implement traffic management cautiously, starting with conservative traffic routing rules and gradually introducing more advanced features like traffic splitting and canary deployments. Take advantage of Istio's traffic control capabilities to implement circuit breaking, retries, and timeout policies to enhance the resilience of your microservices. Regularly update and patch your Istio installation to leverage the latest bug fixes, security patches, and feature enhancements. Lastly, establish a robust backup and disaster recovery strategy to mitigate potential risks and ensure business continuity. By adhering to these best practices, you can effectively manage and operate Istio in production environments, ensuring the reliability, security, and performance of your microservices architecture. Conclusion In the evolving landscape of service-to-service communication, Istio, as a service mesh, has surfaced as an integral component, offering a robust and flexible solution for managing complex communication between microservices in a distributed architecture. Istio's capabilities extend beyond merely facilitating communication to providing comprehensive traffic management, enabling sophisticated routing rules, retries, failovers, and fault injections. It also addresses security, a critical aspect in the microservices world, by implementing it at the infrastructure level, thereby reducing the burden on application code. Furthermore, Istio enhances observability in the system, allowing organizations to effectively monitor and troubleshoot their services. Despite the steep learning curve associated with Istio, the multitude of benefits it offers makes it a worthy investment for organizations. The control and flexibility it provides over microservices are unparalleled. With the growing adoption of microservices, the role of service meshes like Istio is becoming increasingly pivotal, ensuring reliable, secure operation of services, and providing the scalability required in today's dynamic business environment. In conclusion, Istio holds a significant position in the service mesh realm, offering a comprehensive solution for managing microservices at scale. It represents the ongoing evolution in service-to-service communication, driven by the need for more efficient, secure, and manageable solutions. The future of Istio and service mesh, in general, appears promising, with continuous research and development efforts aimed at strengthening and broadening their capabilities. References "What is a service mesh?" (Red Hat) "Istio - Connect, secure, control, and observe services." (Istio) "What is Istio?" (IBM Cloud) "Understanding the Basics of Service Mesh" (Container Journal)
In this article, we will implement a microservice using Clean Architecture and CQRS. The tech stack uses Kotlin, Spring WebFlux with coroutines, PostgreSQL and MongoDB, Kafka as a message broker, and the Arrow-kt functional library, which, as the documentation says, brings idiomatic functional programming to Kotlin. Clean Architecture Clean Architecture is one of the more popular software design approaches. It follows the principles of Dependency Inversion, Single Responsibility, and Separation of Concerns. It consists of concentric circles representing different layers, with the innermost layer being the most abstract and the outermost layer representing the user interface and infrastructure. By separating the concerns of the various components and enforcing the dependency rule, it becomes much easier to understand and modify the code. Depending on abstractions allows you to design your business logic flexibly without having to know the implementation details. The Domain Layer and the Application Layer are the core of the Clean Architecture. These two layers together form the application core, encapsulating the most important business rules of the system. Clean Architecture is a domain-centric architectural approach that separates business logic from technical implementation details. CQRS CQRS stands for Command and Query Responsibility Segregation, a pattern that separates reads and writes into different models, using commands to update data, and queries to read data. Using CQRS, you should have a strict separation between the write model and the read model. Those two models should be processed by separate objects and not be conceptually linked together. Those objects are not physical storage structures but are, for example, command handlers and query handlers. They’re not related to where and how the data will be stored: they’re connected to the processing behavior. Command handlers are responsible for handling commands, mutating state, or doing other side effects. Query handlers are responsible for returning the result of the requested query. They give us: Scalability, which allows for independent scaling of read and write operations Performance: By separating read and write operations, you can optimize each for performance. Reads can be optimized for fast retrieval by using denormalized data structures, caching, and specialized read models tailored to specific query needs. Flexibility allows us to model the read and write sides of the application differently, which provides flexibility in designing the data structures and processing logic to best suit the requirements of each operation. This flexibility can lead to a more efficient and maintainable system, especially in complex domains where the read and write requirements differ significantly. One of the common misconceptions about CQRS is that the commands and queries should be run on separate databases. This isn’t necessarily true, only that the behaviors and responsibilities for both should be separated. This can be within the code, within the structure of the database, or different databases. Nothing in an inner circle can know anything about something in an outer circle. In particular, the name of something declared in an outer circle must not be mentioned by the code in the inner circle. That includes functions and classes, variables, or any other named software entity. In the real world, understanding Clean Architecture can vary from person to person. Since Clean Architecture emphasizes principles such as separation of concerns, dependency inversion, and abstraction layers, different developers may interpret and implement these principles differently based on their own experiences, knowledge, and project requirements. This article shows my personal view of one of the possible ways of implementation. Ultimately, the goal of Clean Architecture is to create software systems that are maintainable, scalable, and easy to understand. Layers Presentation Layer The Presentation Layer (named api here) is the most outside layer and the entry point to our system. The most important part of the presentation layer is the controllers, which define the API endpoints in our system presented to the outside world and are responsible for: Handling interaction with the outside world Presenting, displaying, or returning responses with the data Translating the outside requests data (map requests to application layer commands) Works with framework-specific configuration setup Works on top of the application layer Let's look at the full process of command requests in the microservice. First things first: it accepts REST HTTP requests; validates input; if it's secured, checks credentials, etc.; then maps the request to the DTO the command and calls the AccountCommandService handle method. For example, let's look at creating new account and deposit balance commands methods call flow: Kotlin @Tag(name = "Accounts", description = "Account domain REST endpoints") @RestController @RequestMapping(path = ["/api/v1/accounts"]) class AccountController( private val accountCommandService: AccountCommandService, private val accountQueryService: AccountQueryService ) { @Operation( method = "createAccount", operationId = "createAccount", description = "Create new Account", responses = [ ApiResponse( description = "Create new Account", responseCode = "201", content = [Content( mediaType = MediaType.APPLICATION_JSON_VALUE, schema = Schema(implementation = AccountId::class) )] ), ApiResponse( description = "bad request response", responseCode = "400", content = [Content(schema = Schema(implementation = ErrorHttpResponse::class))] )], ) @PostMapping suspend fun createAccount( @Valid @RequestBody request: CreateAccountRequest ): ResponseEntity<out Any> = eitherScope(ctx) { accountCommandService.handle(request.toCommand()).bind() }.fold( ifLeft = { mapErrorToResponse(it) }, ifRight = { ResponseEntity.status(HttpStatus.CREATED).body(it) } ) @Operation( method = "depositBalance", operationId = "depositBalance", description = "Deposit balance", responses = [ ApiResponse( description = "Deposit balance", responseCode = "200", content = [Content( mediaType = MediaType.APPLICATION_JSON_VALUE, schema = Schema(implementation = BaseResponse::class) )] ), ApiResponse( description = "bad request response", responseCode = "400", content = [Content(schema = Schema(implementation = ErrorHttpResponse::class))] )], ) @PutMapping(path = ["/{id}/deposit"]) suspend fun depositBalance( @PathVariable id: UUID, @Valid @RequestBody request: DepositBalanceRequest ): ResponseEntity<out Any> = eitherScope(ctx) { accountCommandService.handle(request.toCommand(AccountId(id))).bind() }.fold( ifLeft = { mapErrorToResponse(it) }, ifRight = { okResponse(it) } ) } Application and Domain Layers The Application Layer contains the use cases of the application. A use case represents a specific interaction or action that the system can perform. Each use case is implemented as a command or a query. It is part of the whole application core like a Domain Layer and is responsible for: Executing the application use cases (all the actions and commands allowed to be done with the system) Fetch domain objects Manipulating domain objects The Application Layer AccountCommandService has the business logic, which runs required business rules validations, then applies changes to the domain aggregate, persists domain objects in the database, produces the domain events, and persists them in the outbox table within one single transaction. The current application used some not-required small optimization for outbox publishing. After the command service commits the transaction, we publish the event, but we don't care if this publish fails, because the polling publisher realizes that the Spring scheduler will process it anyway. Arrow greatly improves developer experience because Kotlin doesn’t ship the Either type with the standard SDK. Either is an entity whose value can be of two different types, called left and right. By convention, the right is for the success case and the left is for the error one. It allows us to express the fact that a call might return a correct value or an error, and differentiate between the two of them. The left/right naming pattern is just a convention. Either is a great way to make the error handling in your code more explicit. Making the code more explicit reduces the amount of context that you need to keep in your head, which in turn makes the code easier to understand. Kotlin interface AccountCommandService { suspend fun handle(command: CreateAccountCommand): Either<AppError, AccountId> suspend fun handle(command: ChangeAccountStatusCommand): Either<AppError, Unit> suspend fun handle(command: ChangeContactInfoCommand): Either<AppError, Unit> suspend fun handle(command: DepositBalanceCommand): Either<AppError, Unit> suspend fun handle(command: WithdrawBalanceCommand): Either<AppError, Unit> suspend fun handle(command: UpdatePersonalInfoCommand): Either<AppError, Unit> } @Service class AccountCommandServiceImpl( private val accountRepository: AccountRepository, private val outboxRepository: OutboxRepository, private val tx: TransactionalOperator, private val eventPublisher: EventPublisher, private val serializer: Serializer, private val emailVerifierClient: EmailVerifierClient, private val paymentClient: PaymentClient ) : AccountCommandService { override suspend fun handle(command: CreateAccountCommand): Either<AppError, AccountId> = eitherScope(ctx) { emailVerifierClient.verifyEmail(command.contactInfo.email).bind() val (account, event) = tx.executeAndAwait { val account = accountRepository.save(command.toAccount()).bind() val event = outboxRepository.insert(account.toAccountCreatedOutboxEvent(serializer)).bind() account to event } publisherScope.launch { publishOutboxEvent(event) } account.accountId } override suspend fun handle(command: DepositBalanceCommand): Either<AppError, Unit> = eitherScope(ctx) { paymentClient.verifyPaymentTransaction(command.accountId.string(), command.transactionId).bind() val event = tx.executeAndAwait { val foundAccount = accountRepository.getById(command.accountId).bind() foundAccount.depositBalance(command.balance).bind() val account = accountRepository.update(foundAccount).bind() val event = account.toBalanceDepositedOutboxEvent(command.balance, serializer) outboxRepository.insert(event).bind() } publisherScope.launch { publishOutboxEvent(event) } } } The Domain Layer encapsulates the most important business rules of the system. It is the place where we have to start building core business rules. In the domain-centric architecture, we start developing from the domain. The responsibilities of the Domain Layer are as follows: Defining domain models Defining rules, domain, and business errors Executing the application business logic Enforcing the business rules Domain models have data and behavior and represent the domain. We have two approaches for designing: rich and anemic domain models. Anemic models allow external manipulation of our data, and it's usually antipattern because the domain object itself doesn't control its own data. Rich domain models contain both data and behavior. The richer the behavior, the richer the domain model. It exposes only a specific set of public methods, which allows manipulation of data only in the way the domain approves, encapsulates logic, and does validations. Rich domain model properties are read-only by default. Domain models can be always valid or not; it's better to prefer always-valid domain models. At any point in time when we're working with domain state, we know it's valid and don't need to write additional validations to check it. Always-valid domain models mean they are in a valid state all the time. One more important detail is Persistence Ignorance - modeling the domain without taking into account how domain objects will be persisted. Kotlin class Account( val accountId: AccountId = AccountId(), ) { var contactInfo: ContactInfo = ContactInfo() private set var personalInfo: PersonalInfo = PersonalInfo() private set var address: Address = Address() private set var balance: Balance = Balance() private set var status: AccountStatus = AccountStatus.FREE private set var version: Long = 0 private set var updatedAt: Instant? = null private set var createdAt: Instant? = null private set fun depositBalance(newBalance: Balance): Either<AppError, Account> = either { if (balance.balanceCurrency != newBalance.balanceCurrency) raise(InvalidBalanceCurrency("invalid currency: $newBalance")) if (newBalance.amount < 0) raise(InvalidBalanceAmount("invalid balance amount: $newBalance")) balance = balance.copy(amount = (balance.amount + newBalance.amount)) updatedAt = Instant.now() this@Account } fun withdrawBalance(newBalance: Balance): Either<AppError, Account> = either { if (balance.balanceCurrency != newBalance.balanceCurrency) raise(InvalidBalanceCurrency("invalid currency: $newBalance")) if (newBalance.amount < 0) raise(InvalidBalanceAmount("invalid balance amount: $newBalance")) val newAmount = (balance.amount - newBalance.amount) if ((newAmount) < 0) raise(InvalidBalanceError("invalid balance: $newBalance")) balance = balance.copy(amount = newAmount) updatedAt = Instant.now() this@Account } fun updateStatus(newStatus: AccountStatus): Either<AppError, Account> = either { status = newStatus updatedAt = Instant.now() this@Account } fun changeContactInfo(newContactInfo: ContactInfo): Either<AppError, Account> = either { contactInfo = newContactInfo updatedAt = Instant.now() this@Account } fun changeAddress(newAddress: Address): Either<AppError, Account> = either { address = newAddress updatedAt = Instant.now() this@Account } fun changePersonalInfo(newPersonalInfo: PersonalInfo): Either<AppError, Account> = either { personalInfo = newPersonalInfo updatedAt = Instant.now() this@Account } fun incVersion(amount: Long = 1): Either<AppError, Account> = either { if (amount < 1) raise(InvalidVersion("invalid version: $amount")) version += amount updatedAt = Instant.now() this@Account } fun withVersion(amount: Long = 1): Account { version = amount updatedAt = Instant.now() return this } fun decVersion(amount: Long = 1): Either<AppError, Account> = either { if (amount < 1) raise(InvalidVersion("invalid version: $amount")) version -= amount updatedAt = Instant.now() this@Account } fun withUpdatedAt(newValue: Instant): Account { updatedAt = newValue return this } } Infrastructure Layer Next is the Infrastructure Layer, which contains implementations for external-facing services and is responsible for: Interacting with the persistence solution Interacting with other services (HTTP or gRPC clients, message brokers, etc.) Actual implementations of the interfaces from the application layer Identity concerns At the Infrastructure Layer, we have implementations of the Application Layer interfaces. The main write database used PostgreSQL with r2dbc reactive driver, and DatabaseClient with raw SQL queries. If we want to use an ORM entity, we still pass domain objects through the other layer interfaces anyway, and then inside the repository implementation, code map to the ORM entities. For this project, keep Spring annotations as is; but if we want cleaner implementation, it's possible to move them to another layer. In this example, the project SQL schema is simplified and not normalized. Kotlin interface AccountRepository { suspend fun getById(id: AccountId): Either<AppError, Account> suspend fun save(account: Account): Either<AppError, Account> suspend fun update(account: Account): Either<AppError, Account> } @Repository class AccountRepositoryImpl( private val dbClient: DatabaseClient ) : AccountRepository { override suspend fun save(account: Account): Either<AppError, Account> = eitherScope<AppError, Account>(ctx) { dbClient.sql(INSERT_ACCOUNT_QUERY.trimMargin()) .bindValues(account.withVersion(FIRST_VERSION).toPostgresEntityMap()) .fetch() .rowsUpdated() .awaitSingle() account } override suspend fun update(account: Account): Either<AppError, Account> = eitherScope(ctx) { dbClient.sql(OPTIMISTIC_UPDATE_QUERY.trimMargin()) .bindValues(account.withUpdatedAt(Instant.now()).toPostgresEntityMap(withOptimisticLock = true)) .fetch() .rowsUpdated() .awaitSingle() account.incVersion().bind() } override suspend fun getById(id: AccountId): Either<AppError, Account> = eitherScope(ctx) { dbClient.sql(GET_ACCOUNT_BY_ID_QUERY.trimMargin()) .bind(ID_FIELD, id.id) .map { row, _ -> row.toAccount() } .awaitSingleOrNull() ?: raise(AccountNotFoundError("account for id: $id not found")) } } Below is an important detail about outbox repository realization: To be able to handle the case of multiple pod instances processing in parallel outbox table, of course, we have idempotent consumers. However, as we can, we have to avoid processing the same table events more than one time. To prevent multiple instances from selecting and publishing the same events, we use FOR UPDATE SKIP LOCKED. This combination does the next thing: When one instance tries to select a batch of outbox events, if some other instance already selected these records, first, one will skip locked records and select the next available and not locked, and so on. But again, it's only my personal preferred way of implementation. The use of only polling publishers is usually the default one. As a possible alternative, use Debezium (for example), but it's up to you. Kotlin interface OutboxRepository { suspend fun insert(event: OutboxEvent): Either<AppError, OutboxEvent> suspend fun deleteWithLock( event: OutboxEvent, callback: suspend (event: OutboxEvent) -> Either<AppError, Unit> ): Either<AppError, OutboxEvent> suspend fun deleteEventsWithLock( batchSize: Int, callback: suspend (event: OutboxEvent) -> Either<AppError, Unit> ): Either<AppError, Unit> } @Component class OutboxRepositoryImpl( private val dbClient: DatabaseClient, private val tx: TransactionalOperator ) : OutboxRepository { override suspend fun insert(event: OutboxEvent): Either<AppError, OutboxEvent> = eitherScope(ctx) { dbClient.sql(INSERT_OUTBOX_EVENT_QUERY.trimMargin()) .bindValues(event.toPostgresValuesMap()) .map { row, _ -> row.get(ROW_EVENT_ID, String::class.java) } .one() .awaitSingle() .let { event } } override suspend fun deleteWithLock( event: OutboxEvent, callback: suspend (event: OutboxEvent) -> Either<AppError, Unit> ): Either<AppError, OutboxEvent> = eitherScope { tx.executeAndAwait { dbClient.sql(GET_OUTBOX_EVENT_BY_ID_FOR_UPDATE_SKIP_LOCKED_QUERY.trimMargin()) .bindValues(mutableMapOf(EVENT_ID to event.eventId)) .map { row, _ -> row.get(ROW_EVENT_ID, String::class.java) } .one() .awaitSingleOrNull() callback(event).bind() deleteOutboxEvent(event).bind() event } } override suspend fun deleteEventsWithLock( batchSize: Int, callback: suspend (event: OutboxEvent) -> Either<AppError, Unit> ): Either<AppError, Unit> = eitherScope(ctx) { tx.executeAndAwait { dbClient.sql(GET_OUTBOX_EVENTS_FOR_UPDATE_SKIP_LOCKED_QUERY.trimMargin()) .bind(LIMIT, batchSize) .map { row, _ -> row.toOutboxEvent() } .all() .asFlow() .onStart { log.info { "start publishing outbox events batch: $batchSize" } } .onEach { callback(it).bind() } .onEach { event -> deleteOutboxEvent(event).bind() } .onCompletion { log.info { "completed publishing outbox events batch: $batchSize" } } .collect() } } private suspend fun deleteOutboxEvent(event: OutboxEvent): Either<AppError, Long> = eitherScope(ctx) { dbClient.sql(DELETE_OUTBOX_EVENT_BY_ID_QUERY) .bindValues(mutableMapOf(EVENT_ID to event.eventId)) .fetch() .rowsUpdated() .awaitSingle() } } The polling publisher implementation is a scheduled process that does the same job for publishing and deleting events at the given interval, as typed earlier, and uses the same service method: Kotlin @Component @ConditionalOnProperty(prefix = "schedulers", value = ["outbox.enable"], havingValue = "true") class OutboxScheduler( private val outboxRepository: OutboxRepository, private val publisher: EventPublisher, ) { @Value("\${schedulers.outbox.batchSize}") private var batchSize: Int = 30 @Scheduled( initialDelayString = "\${schedulers.outbox.initialDelayMillis}", fixedRateString = "\${schedulers.outbox.fixedRate}" ) fun publishOutboxEvents() = runBlocking { eitherScope { outboxRepository.deleteEventsWithLock(batchSize) { publisher.publish(it) }.bind() }.fold( ifLeft = { err -> log.error { "error while publishing scheduler outbox events: $err" } }, ifRight = { log.info { "outbox scheduler published events" } } ) } } A domain event is something interesting from a business point of view that happened within the system; something that already occurred. We're capturing the fact something happened with the system. After events have been published from the outbox table to the broker, in this application, it consumes them from Kafka, and the consumers themselves call EventHandlerService methods, which builds a read model for our domain aggregates. The read model of a CQRS-based system provides materialized views of the data, typically as highly denormalized views. These views are tailored to the interfaces and display requirements of the application, which helps to maximize both display and query performance. For error handling and retry, messages prefer to use separate retry topics and listeners. Using the stream of events as the write store rather than the actual data at a point in time avoids update conflicts on a single aggregate and maximizes performance and scalability. The events can be used to asynchronously generate materialized views of the data that are used to populate the read store. As with any system where the write and read stores are separate, systems based on this pattern are only eventually consistent. There will be some delay between the event being generated and the data store being updated. Here is Kafka consumer implementation: Kotlin @Component class BalanceDepositedEventConsumer( private val eventProcessor: EventProcessor, private val kafkaTopics: KafkaTopics ) { @KafkaListener( groupId = "\${kafka.consumer-group-id:account_microservice_group_id}", topics = ["\${topics.accountBalanceDeposited.name}"], ) fun process(ack: Acknowledgment, record: ConsumerRecord<String, ByteArray>) = eventProcessor.process( ack = ack, consumerRecord = record, deserializationClazz = BalanceDepositedEvent::class.java, onError = eventProcessor.errorRetryHandler(kafkaTopics.accountBalanceDepositedRetry.name, DEFAULT_RETRY_COUNT) ) { event -> eventProcessor.on( ack = ack, consumerRecord = record, event = event, retryTopic = kafkaTopics.accountBalanceDepositedRetry.name ) } @KafkaListener( groupId = "\${kafka.consumer-group-id:account_microservice_group_id}", topics = ["\${topics.accountBalanceDepositedRetry.name}"], ) fun processRetry(ack: Acknowledgment, record: ConsumerRecord<String, ByteArray>) = eventProcessor.process( ack = ack, consumerRecord = record, deserializationClazz = BalanceDepositedEvent::class.java, onError = eventProcessor.errorRetryHandler(kafkaTopics.accountBalanceDepositedRetry.name, DEFAULT_RETRY_COUNT) ) { event -> eventProcessor.on( ack = ack, consumerRecord = record, event = event, retryTopic = kafkaTopics.accountBalanceDepositedRetry.name ) } } At the Application Layer, AccountEventsHandlerService is implemented in the following way: Kotlin interface AccountEventHandlerService { suspend fun on(event: AccountCreatedEvent): Either<AppError, Unit> suspend fun on(event: BalanceDepositedEvent): Either<AppError, Unit> suspend fun on(event: BalanceWithdrawEvent): Either<AppError, Unit> suspend fun on(event: PersonalInfoUpdatedEvent): Either<AppError, Unit> suspend fun on(event: ContactInfoChangedEvent): Either<AppError, Unit> suspend fun on(event: AccountStatusChangedEvent): Either<AppError, Unit> } @Component class AccountEventHandlerServiceImpl( private val accountProjectionRepository: AccountProjectionRepository ) : AccountEventHandlerService { override suspend fun on(event: AccountCreatedEvent): Either<AppError, Unit> = eitherScope(ctx) { accountProjectionRepository.save(event.toAccount()).bind() } override suspend fun on(event: BalanceDepositedEvent): Either<AppError, Unit> = eitherScope(ctx) { findAndUpdateAccountById(event.accountId, event.version) { account -> account.depositBalance(event.balance).bind() }.bind() } private suspend fun findAndUpdateAccountById( accountId: AccountId, eventVersion: Long, block: suspend (Account) -> Account ): Either<AppError, Account> = eitherScope(ctx) { val foundAccount = findAndValidateVersion(accountId, eventVersion).bind() val accountForUpdate = block(foundAccount) accountProjectionRepository.update(accountForUpdate).bind() } private suspend fun findAndValidateVersion( accountId: AccountId, eventVersion: Long ): Either<AppError, Account> = eitherScope(ctx) { val foundAccount = accountProjectionRepository.getById(accountId).bind() validateVersion(foundAccount, eventVersion).bind() foundAccount } } The infrastructure layer read model repository uses MongoDB's Kotlin coroutines driver: Kotlin interface AccountProjectionRepository { suspend fun save(account: Account): Either<AppError, Account> suspend fun update(account: Account): Either<AppError, Account> suspend fun getById(id: AccountId): Either<AppError, Account> suspend fun getByEmail(email: String): Either<AppError, Account> suspend fun getAll(page: Int, size: Int): Either<AppError, AccountsList> suspend fun upsert(account: Account): Either<AppError, Account> } Kotlin @Component class AccountProjectionRepositoryImpl( mongoClient: MongoClient, ) : AccountProjectionRepository { private val accountsDB = mongoClient.getDatabase(ACCOUNTS_DB) private val accountsCollection = accountsDB.getCollection<AccountDocument>(ACCOUNTS_COLLECTION) override suspend fun save(account: Account): Either<AppError, Account> = eitherScope<AppError, Account>(ctx) { val insertResult = accountsCollection.insertOne(account.toDocument()) log.info { "account insertOneResult: $insertResult, account: $account" } account } override suspend fun update(account: Account): Either<AppError, Account> = eitherScope(ctx) { val filter = and(eq(ACCOUNT_ID, account.accountId.string()), eq(VERSION, account.version)) val options = FindOneAndUpdateOptions().upsert(false).returnDocument(ReturnDocument.AFTER) accountsCollection.findOneAndUpdate( filter, account.incVersion().bind().toBsonUpdate(), options ) ?.toAccount() ?: raise(AccountNotFoundError("account with id: ${account.accountId} not found")) } override suspend fun upsert(account: Account): Either<AppError, Account> = eitherScope(ctx) { val filter = and(eq(ACCOUNT_ID, account.accountId.string())) val options = FindOneAndUpdateOptions().upsert(true).returnDocument(ReturnDocument.AFTER) accountsCollection.findOneAndUpdate( filter, account.toBsonUpdate(), options ) ?.toAccount() ?: raise(AccountNotFoundError("account with id: ${account.accountId} not found")) } override suspend fun getById(id: AccountId): Either<AppError, Account> = eitherScope(ctx) { accountsCollection.find<AccountDocument>(eq(ACCOUNT_ID, id.string())) .firstOrNull() ?.toAccount() ?: raise(AccountNotFoundError("account with id: $id not found")) } override suspend fun getByEmail(email: String): Either<AppError, Account> = eitherScope(ctx) { val filter = and(eq(CONTACT_INFO_EMAIL, email)) accountsCollection.find(filter).firstOrNull()?.toAccount() ?: raise(AccountNotFoundError("account with email: $email not found")) } override suspend fun getAll( page: Int, size: Int ): Either<AppError, AccountsList> = eitherScope<AppError, AccountsList>(ctx) { parZip(coroutineContext, { accountsCollection.find() .skip(page * size) .limit(size) .map { it.toAccount() } .toList() }, { accountsCollection.find().count() }) { list, totalCount -> AccountsList( page = page, size = size, totalCount = totalCount, accountsList = list ) } } } Read queries' way through the layers is very similar: we accept HTTP requests at the API layer: Kotlin @Tag(name = "Accounts", description = "Account domain REST endpoints") @RestController @RequestMapping(path = ["/api/v1/accounts"]) class AccountController( private val accountCommandService: AccountCommandService, private val accountQueryService: AccountQueryService ) { @Operation( method = "getAccountByEmail", operationId = "getAccountByEmail", description = "Get account by email", responses = [ ApiResponse( description = "Get account by email", responseCode = "200", content = [Content( mediaType = MediaType.APPLICATION_JSON_VALUE, schema = Schema(implementation = AccountResponse::class) )] ), ApiResponse( description = "bad request response", responseCode = "400", content = [Content(schema = Schema(implementation = ErrorHttpResponse::class))] )], ) @GetMapping(path = ["/email/{email}"]) suspend fun getAccountByEmail( @PathVariable @Email @Size( min = 6, max = 255 ) email: String ): ResponseEntity<out Any> = eitherScope(ctx) { accountQueryService.handle(GetAccountByEmailQuery(email)).bind() }.fold( ifLeft = { mapErrorToResponse(it) }, ifRight = { ResponseEntity.ok(it.toResponse()) } ) } Application Layer AccountQueryService methods: Kotlin interface AccountQueryService { suspend fun handle(query: GetAccountByIdQuery): Either<AppError, Account> suspend fun handle(query: GetAccountByEmailQuery): Either<AppError, Account> suspend fun handle(query: GetAllAccountsQuery): Either<AppError, AccountsList> } Kotlin @Service class AccountQueryServiceImpl( private val accountRepository: AccountRepository, private val accountProjectionRepository: AccountProjectionRepository ) : AccountQueryService { override suspend fun handle(query: GetAccountByIdQuery): Either<AppError, Account> = eitherScope(ctx) { accountRepository.getById(query.id).bind() } override suspend fun handle(query: GetAccountByEmailQuery): Either<AppError, Account> = eitherScope(ctx) { accountProjectionRepository.getByEmail(query.email).bind() } override suspend fun handle(query: GetAllAccountsQuery): Either<AppError, AccountsList> = eitherScope(ctx) { accountProjectionRepository.getAll(page = query.page, size = query.size).bind() } } And it uses PostgreSQL or MongoDB repositories to get the data depending on the query use case: Kotlin @Component class AccountProjectionRepositoryImpl( mongoClient: MongoClient, ) : AccountProjectionRepository { private val accountsDB = mongoClient.getDatabase(ACCOUNTS_DB) private val accountsCollection = accountsDB.getCollection<AccountDocument>(ACCOUNTS_COLLECTION) override suspend fun getByEmail(email: String): Either<AppError, Account> = eitherScope(ctx) { val filter = and(eq(CONTACT_INFO_EMAIL, email)) accountsCollection.find(filter) .firstOrNull() ?.toAccount() ?: raise(AccountNotFoundError("account with email: $email not found")) } } Final Thoughts In real-world applications, we have to implement many more necessary features, like K8s health checks, circuit breakers, rate limiters, etc., so this project is simplified for demonstration purposes. The source code is in my GitHub, please star it if is helpful and useful for you. For feedback or questions, feel free to contact me!
Docker has become an essential tool for developers, offering consistent and isolated environments without installing full-fledged products locally. The ideal setup for microservice development using Spring Boot with MySQL as the backend often involves a remotely hosted database. However, for rapid prototyping or local development, running a MySQL container through Docker offers a more streamlined approach. I encountered a couple of issues while attempting to set up this configuration with the help of Docker Desktop for a proof of concept. An online search revealed a lack of straightforward guides on integrating Spring Boot microservices with MySQL in Docker Desktop; most resources primarily focus on containerizing the Spring Boot application. Recognizing this gap, I decided to write this short article. Prerequisites Before diving in, we must have the following: A foundational understanding of Spring Boot and microservices architecture Familiarity with Docker containers Docker Desktop installed on our machine Docker Desktop Setup We can install Docker Desktop using this link. Installation is straightforward and includes steps that can be navigated efficiently, as illustrated in the accompanying screenshots. Configuring MySQL Container Once we have installed the Docker desktop when we launch, we will get through some standard questions, and we can skip the registration part. Once the desktop app is ready, then we need to search for the MySQL container, as shown below: We need to click Pull and then Run the container. Once you run the container, the settings dialog will pop up, as shown below. Please enter the settings as below: MYSQL_ROOT_PASSWORD: This environment variable specifies the password that will be set for the MySQL root superuser account. MYSQL_DATABASE: This environment variable allows us to specify the name of a database that will be created on image startup. If a user/password was supplied (see below), that user will be granted superuser access (corresponding to GRANT ALL) to this database. MYSQL_USER, MYSQL_PASSWORD: These variables are used to create a new user and set that user's password. This user will be granted superuser permissions for the database specified by the MYSQL_DATABASE variable. Upon running the container, Docker Desktop displays logs indicating the container's status. We can now connect to the MySQL instance using tools like MySQL Workbench to manage database objects. Spring Application Configuration In the Spring application, we can configure the configurations below in the application.properties. YAML spring.esign.datasource.jdbc-url=jdbc:mysql://localhost:3306/e-sign?allowPublicKeyRetrieval=true&useSSL=false spring.esign.datasource.username=e-sign spring.esign.datasource.password=Password1 We opted for a custom prefix spring.esign over the default spring.datasource for our database configuration within the Spring Boot application. This approach shines in scenarios where the application requires connections to multiple databases. To enable this custom configuration, we need to define the Spring Boot configuration class ESignDbConfig: Java @Configuration @EnableTransactionManagement @EnableJpaRepositories( entityManagerFactoryRef = "eSignEntityManagerFactory", transactionManagerRef = "eSignTransactionManager", basePackages ="com.icw.esign.repository") public class ESignDbConfig { @Bean("eSignDataSource") @ConfigurationProperties(prefix="spring.esign.datasource") public DataSource geteSignDataSource(){ return DataSourceBuilder.create().type(HikariDataSource.class).build(); } @Bean(name = "eSignEntityManagerFactory") public LocalContainerEntityManagerFactoryBean eSignEntityManagerFactory( EntityManagerFactoryBuilder builder, @Qualifier("eSignDataSource") DataSource dataSource) { return builder.dataSource(dataSource).packages("com.icw.esign.dao") .build(); } @Bean(name = "eSignTransactionManager") public PlatformTransactionManager eSignTransactionManager(@Qualifier("eSignEntityManagerFactory") EntityManagerFactory entityManagerFactory) { return new JpaTransactionManager(entityManagerFactory); } } @Bean("eSignDataSource"): This method defines a Spring bean for the eSign module's data source. The @ConfigurationProperties(prefix="spring.esign.datasource") annotation is used to automatically map and bind all configuration properties starting with spring.esign.datasource from the application's configuration files (like application.properties or application.yml) to this DataSource object. The method uses DataSourceBuilder to create and configure a HikariDataSource, a highly performant JDBC connection pool. This implies that the eSign module will use a dedicated database whose connection parameters are isolated from other modules or the main application database. @Bean(name = "eSignEntityManagerFactory"): This method creates a LocalContainerEntityManagerFactoryBean, which is responsible for creating the EntityManagerFactory. This factory is crucial for managing JPA entities specific to the eSign module. The EntityManagerFactory is configured to use the eSignDataSource for its database operations and to scan the package com.icw.esign.dao for entity classes. This means that only entities in this package or its subpackages will be managed by this EntityManagerFactory and thus, can access the eSign database. @Bean(name = "eSignTransactionManager"): This defines a PlatformTransactionManager specific way of managing transactions of the eSignmodule's EntityManagerFactory. This transaction manager ensures that all database operations performed by entities managed by the eSignEntityManagerFactory are wrapped in transactions. It enables the application to manage transaction boundaries, roll back operations on failures, and commit changes when operations succeed. Repository Now that we have defined configurations, we can create repository classes and build other objects required for the API endpoint. Java @Repository public class ESignDbRepository { private static final Logger logger = LoggerFactory.getLogger(ESignDbRepository.class); @Qualifier("eSignEntityManagerFactory") @Autowired private EntityManager entityManager; @Autowired ObjectMapper objectMapper; String P_GET_DOC_ESIGN_INFO = "p_get_doc_esign_info"; public List<DocESignMaster> getDocumentESignInfo(String docUUID) { StoredProcedureQuery proc = entityManager.createStoredProcedureQuery(P_GET_DOC_ESIGN_INFO, DocESignMaster.class); proc.registerStoredProcedureParameter("v_doc_uuid", String.class, ParameterMode.IN); proc.setParameter("v_doc_uuid", docUUID); try { return (List<DocESignMaster>) proc.getResultList(); } catch (PersistenceException ex) { logger.error("Error while fetching document eSign info for docUUID: {}", docUUID, ex); } return Collections.emptyList(); } } @Qualifier("eSignEntityManagerFactory"): Specifies which EntityManagerFactory should be used to create EntityManager, ensuring that the correct database configuration is used for eSign operations. Conclusion Integrating Spring Boot microservices with Docker Desktop streamlines microservice development and testing. This guide walks through the essential steps of setting up a Spring Boot application and ensuring seamless service communication with a MySQL container hosted on the Docker Desktop application. This quick setup guide is useful for proof of concept or setting up an isolated local development environment.
What Is OAuth2? OAuth 2.0 is an authorization protocol. It provides the framework to obtain limited access to a protected resource by a third-party application on behalf of the resource owner. For example, we log in to our LinkedIn account using Google account username and password. The Google authorization (OAuth2.0) server grants a temporary access token to LinkedIn which authorizes the user to access LinkedIn resources. Note that here, LinkedIn trusts Google to validate the user and acts as an authorization proxy. What Are Microservices? A microservice is a service-oriented architecture pattern wherein applications are built as a collection of various smallest independent service units. It is a software engineering approach that focuses on decomposing an application into single-function modules with well-defined interfaces. These modules can be independently deployed and operated by small teams who own the entire life-cycle of the service. Why Is OAuth2 a Good Solution for Secure Communications With Microservices? This idea of separation of concern leverages microservices security by decoupling the part that does authorization from business logic. The responsibility is delegated to a centralized and trusted authorization server and the actual application is free from security concerns in this regard. It promotes the granularity of service which microservices typically are all about. Apart from reducing complexity, OAuth 2.0 in microservices provides a platform to implement consistent and standard security policies across the system. The authorization is flexible, meaning it can be revoked at any time. This helps security management to restrict unnecessary or limited access to resources. Since access tokens provided by the OAuth 2.0 server are stateless (JSON Web Token - JWT), it eliminates the need for storage and transmission of sensitive credentials or session data between microservices. Overall, both (OAuth 2.0 and microservices) in combination enhance the performance and scalability of the system. Purpose I wanted a solution where we could easily capture OAuth2 and OAuth2 clients for secure communication with all of the microservices, focusing on how to achieve OAuth2 full flavor in a microservices architecture. The user can’t access API without a token. The token will be available when the user is given basic authentication details to generate a token for access API. All requests will consider one entry point (API Gateway), but service-to-service can communicate. The API Gateway will use dynamic routing with the Zuul Netflix OSS component. Every request will check authorization when the request arrives in the service, and the service will request the authorization server to verify if it is either authenticated or not. The entire Meta configuration settled into the central configuration on GitHub (you can manage it on any repository). Goal Achieve authentication/authorization, based on Spring Security, OAuth2, and OAuth2 client Understanding microservices architecture using Spring Cloud and Netflix OSS Demonstration of microservice architecture based on Java, Spring, and OAuth2 Spring Cloud and Microservices Firstly, we do not write a microservice. We write a service that eventually will be called microservice when deployed with other services to form an application. Having said that, Spring Cloud just gives you abstractions over some set of tools (Eureka, Zuul, Feign, Ribbon, etc.), making it easy for you to integrate with spring applications. However, you can also achieve microservice architecture without using Spring Cloud. You can take advantage of tools like Kubernetes, Docker Swarm, HAProxy, Kong, NGINX, etc. to achieve the same. Using Spring Cloud has its own pros and cons and vice versa. High-Level Microservice Architecture With Authorizations Users log in to the system using basic authorization and login credentials. The user will get a token if the user's basic auth and login credentials are matched. Next, the user sends a request to access data from the service. The API gateway receives the request and checks with the authorization server. Every request has one entry point API Gateway. Security checking and dynamic routing to the service Every service has a single database to manipulate data. Spring Cloud Key Concept and Features Spring Cloud works for microservices to manage configuration. Intelligent routing and services discovery Service-to-service call Load balancing (it properly distributes network traffic to the backend server) Leadership election (the application works with another application as a third-party system) Global lock (two threads are not accessed simultaneously for the same resource at the same time) Distributed configuration and messaging If you want to avail many services in one application, then the cloud-based application is an easy way. Spring Cloud works in the same way. Spring Boot Key Concept and Features Spring Boot works to create microservices. Spring application creates a stand-alone Spring application. Web application HTTP embedded (Tomcat, Jetty, or Undertow); no need to deploy WAR file Externalized configuration Security (it is secure inbuilt with basic authentication on all HTTP endpoints) Application event and listener Spring Boot works on product-based web applications. It is used for unit test development and integration test time reduction. Spring Cloud Advantages Provides cloud service development Microservice-based architecture and configuration Provides inter-service communication Based on the Spring Boot model Spring Cloud 5 Main Annotations 1. @EnableConfigServer This annotation converts the application into a server which more applications use to get their configuration. 2. @EnableEurekaServer This annotation is used for Eureka Discovery Services for other applications that can be used to locate services using it. 3. @EnableDiscoveryClient This annotation helps an application register in the service discovery and discover other services using it. 4. @EnableCircuitBreaker Use the circuit breaker pattern to continue operating when related services fail and prevent cascading failure. This annotation is used for Hystrix Circuit Breaker. 5. @HyStrixCommand(fallbackmethod="MethodName") Hystrix is a latency and fault tolerance library for distributed systems. 4 Common Netflix Components Spring Cloud Netflix provides Netflix OSS integrations for Spring Boot apps through autoconfiguration and binding to the Spring Environment and other Spring programming model idioms. With a few simple annotations, you can quickly enable and configure the common patterns inside your application and build large distributed systems with battle-tested Netflix components. The patterns provided include Service Discovery (Eureka), Circuit Breaker (Hystrix), Intelligent Routing (Zuul), and Client-Side Load Balancing (Ribbon). 1. Eureka (Service Registration and Discovery) REST service which registers itself at the registry (Eureka Client) Web application, which is consuming the REST service as a registry-aware client (Spring Cloud Netflix Feign Client) 2. Ribbon (Dynamic Routing and Load Balancer) Ribbon primarily provides client-side load-balancing algorithms. APIs that integrate load balancing, fault tolerance, caching/batching on top of other Ribbon modules and Hystrix REST client built on top of Apache HttpClient integrated with load balancers (deprecated and being replaced by ribbon module Configurable load-balancing rules 3. Hystrix (Circuit Breaker) Hystrix is a fault-tolerance Java library. This tool is designed to separate points of access to remote services, systems, and 3rd-party libraries in a distributed environment like microservices. It improves the overall system by isolating the failing services and preventing the cascading effect of failures. 4. Zuul (Edge Server) Zuul is the front door for all requests from devices and websites to the backend of the Netflix streaming application. Zuul will serve as our API gateway. Handle dynamic routing Built to enable dynamic routing, monitoring, resiliency, and security What Is a Feign Client? Netflix provides Feign as an abstraction over REST-based calls, by which microservices can communicate with each other, but developers do not have to bother about REST internal details. Feign Client, which works on the declarative principle. We must create an interface/contract, then Spring creates the original implementation on the fly, so a REST-based service call is abstracted from developers. Not only that — if you want to customize the call, like encoding your request or decoding the response in a custom object, you can do it with Feign in a declarative way. Feign, as a client, is an important tool for microservice developers to communicate with other microservices via Rest API. The Feign Client uses a declarative approach for accessing the API. To use it, we must first enable the Spring Cloud support for it on our Spring Boot Application with the @EnableFeignClients annotation at the class level on a @Configuration class. Server Side Load Balancing In JavaEE architecture, we deploy our WAR/EAR files into multiple application servers, then we create a pool of servers and put a load balancer (Netscaler) in front of it, which has a public IP. The client makes a request using that public IP, and Netscaler decides in which internal application server it forwards the request by round robin or sticky session algorithm. We call it server-side load balancing. Technology Stack Java 8+ Spring latest Spring Security OAuth2, OAuth2 Client Spring Cloud Netflix OSS PostgreSQL IntelliJ How To Implement OAuth2 Security in Microservices Step 1: Create Project "central configuration" for All Services With microservices, we create a central config server where all configurable parameters of microservices are written and version-controlled. The benefit of a central config server is that if we change a property for a microservice, it can reflect that on the fly without redeploying the microservice. You can create a project using spring initializr. application.properties: Properties files xxxxxxxxxx 1 13 1 spring.application.name=ehealth-central-configuration 2 server.port=8888 3 eureka.client.service-url.defaultZone=http://localhost:8761/eureka/ 4 # available profiles of the application 5 spring.profiles.active=local,development,production 6 spring.cloud.config.server.git.uri=https://github.com/amran-bd/cloud-config 7 spring.cloud.config.server.git.clone-on-start=true 8 spring.cloud.config.server.git.search-paths=patient-management-service,ehealth-api-gateway,eureka-service-discovery,clinic-management-service 9 management.security.enabled=false 10 #To remove WAR - Could not locate PropertySource: None of labels [] found 11 health.config.enabled=false 12 # To remove I/O Issue Could not locate PropertySource: I/O error on GET request for 13 spring.cloud.config.enabled=false Hint: You can use your Git server or local machine. A new service name will be added if a new service is introduced. Here is the link to my GitHub repository if you would like to use it. EhealthCentralConfigurationApplication.Java Class Example: Java xxxxxxxxxx 1 15 1 package com.amran.central.config; 2 3 import org.springframework.boot.SpringApplication; 4 import org.springframework.boot.autoconfigure.SpringBootApplication; 5 import org.springframework.cloud.config.server.EnableConfigServer; 6 7 @EnableConfigServer 8 @SpringBootApplication 9 public class EhealthCentralConfigurationApplication { 10 11 public static void main(String[] args) { 12 SpringApplication.run(EhealthCentralConfigurationApplication.class, args); 13 } 14 15 } You must include an annotation @EnableConfigServer. Example of Creating a Central Configuration for Services: Hint: Create folder <projectName>/<projectName-development.properties>. Step 2: Create a Project "Discovery Server" for All Discoverable Services We have already discussed the discovery server in this article. bootstrap.properties: Properties files xxxxxxxxxx 1 1 spring.application.name=eureka-service-discovery 2 spring.profiles.active=development 3 # ip and port of the config server 4 spring.cloud.config.uri=http://localhost:8888 5 # expose actuator endpoints 6 management.endpoints.web.exposure.include=refresh 7 8 management.security.enabled=false 9 spring.cloud.config.fail-fast=true Here, we can enable and disable other actuator endpoints through property files.If you want to enable all actuator endpoints, then add the following property:management.endpoints.web.exposure.include=*. To enable only specific actuator endpoints, provide the list of endpoint IDs: management.endpoints.web.exposure.include=health,info, beans,env In some cases, it may be desirable to fail the startup of a service if it cannot connect to the Config Server. If this is the desired behavior, set the bootstrap configuration property spring.cloud.config.fail.Fast=true and the client will halt with an exception. EurkeaServiceDiscoveryApplication.java Class Example: Java xxxxxxxxxx 1 15 1 package com.amran.service.discovery; 2 3 import org.springframework.boot.SpringApplication; 4 import org.springframework.boot.autoconfigure.SpringBootApplication; 5 import org.springframework.cloud.netflix.eureka.server.EnableEurekaServer; 6 7 @SpringBootApplication 8 @EnableEurekaServer 9 public class EurekaServiceDiscoveryApplication { 10 11 public static void main(String[] args) { 12 SpringApplication.run(EurekaServiceDiscoveryApplication.class, args); 13 } 14 15 } You must include an annotation @EnableEurekaServer. Step 3: Create Project "API Gateway" for All Services Entry Points This is the most valuable portion. Here, we write the authorization server in the same project. API Gateway Project Structure application.yml Properties files xxxxxxxxxx 1 19 1 #hystrix: 2 # command: 3 # default: 4 # execution: 5 # isolation: 6 # thread: 7 # timeoutInMilliseconds: 5000 8 hystrix: 9 command: 10 clinic-management-service: 11 execution: 12 isolation: 13 thread: 14 timeoutInMilliseconds: 5000 15 patient-management-service: 16 execution: 17 isolation: 18 thread: 19 timeoutInMilliseconds: 5000 Here, define timeout for every separate service; you can use the default. bootstrap.properties Properties files xxxxxxxxxx 1 1 spring.application.name=ehealth-api-gateway 2 spring.profiles.active=development 3 # ip and port of the config server 4 spring.cloud.config.uri=http://localhost:8888 5 # expose actuator endpoints 6 management.endpoints.web.exposure.include=refresh 7 8 management.security.enabled=false 9 spring.cloud.config.fail-fast=true Central Configuration Example ehealth-api-gateway-development.properties: Properties files x 1 spring.application.name=ehealth-api-gateway 2 server.port=8080 3 eureka.client.service-url.defaultZone=http://localhost:8761/eureka/ 4 5 ## PostgreSQL 6 spring.datasource.url=jdbc:postgresql://localhost:3307/ehealth-security 7 spring.datasource.username=postgres 8 spring.datasource.password=test1373 9 spring.datasource.type=com.zaxxer.hikari.HikariDataSource 10 # Hikari will use the above plus the following to setup connection pooling 11 spring.datasource.hikari.minimumIdle=3 12 spring.datasource.hikari.maximumPoolSize=500 13 spring.datasource.hikari.idleTimeout=30000 14 spring.datasource.hikari.poolName=SpringBootJPAHikariCP 15 spring.datasource.hikari.maxLifetime=2000000 16 spring.datasource.hikari.connectionTimeout=30000 17 spring.datasource.pool-prepared-statements=true 18 spring.datasource.max-open-prepared-statements=250 19 spring.jpa.hibernate.connection.provider_class=org.hibernate.hikaricp.internal.HikariCPConnectionProvider 20 spring.jpa.properties.hibernate.dialect = org.hibernate.dialect.PostgreSQL82Dialect 21 22 #Hibernate Configuration 23 spring.jpa.generate-ddl = true 24 spring.jpa.hibernate.ddl-auto=update 25 spring.jpa.show-sql=true 26 27 server.error.include-stacktrace=never 28 29 30 #feign.hystrix.enabled=true 31 #hystrix.shareSecurityContext=true 32 33 #All url come with prefix/api will interpret 34 zuul.prefix=/api 35 36 #Dynamic Service Registration in Eureka Server (API Gateway) 37 zuul.routes.patient-management-service.path=/patient-management-service/** 38 #zuul.routes.patient-management-service.url=http://localhost:8081 39 zuul.routes.patient-management-service.sensitive-headers 40 zuul.routes.patient-management-service.service-id=patient-management-service 41 42 zuul.routes.clinic-management-service.path=/clinic-management-service/** 43 #zuul.routes.patient-management-service.url=http://localhost:8082 44 zuul.routes.clinic-management-service.sensitive-headers 45 zuul.routes.clinic-management-service.service-id=clinic-management-service Zuul filtered 4-types while doing dynamic routing. Zuul filters store request and state information in (and share it using) the RequestContext. You can use that to get to the HttpServletRequest and then log the HTTP method and URL of the request before it is sent on its way. ErrorFilter, PreFilter, PostFilter, and RouteFilter Java xxxxxxxxxx 1 30 1 package com.amran.api.gateway.filter; 2 3 import com.netflix.zuul.ZuulFilter; 4 5 /** 6 * @Author : Amran Hosssain on 6/27/2020 7 */ 8 public class RouteFilter extends ZuulFilter { 9 10 @Override 11 public String filterType() { 12 return "route"; 13 } 14 15 @Override 16 public int filterOrder() { 17 return 1; 18 } 19 20 @Override 21 public boolean shouldFilter() { 22 return true; 23 } 24 25 @Override 26 public Object run() { 27 System.out.println("Inside Route Filter"); 28 return null; 29 } 30 } EhealthApiGatewayApplication.java Class Example Java xxxxxxxxxx 1 42 1 package com.amran.api.gateway; 2 3 import com.amran.api.gateway.filter.PostFilter; 4 import com.amran.api.gateway.filter.PreFilter; 5 import com.amran.api.gateway.filter.ErrorFilter; 6 import com.amran.api.gateway.filter.RouteFilter; 7 import org.springframework.boot.SpringApplication; 8 import org.springframework.boot.autoconfigure.SpringBootApplication; 9 import org.springframework.cloud.client.circuitbreaker.EnableCircuitBreaker; 10 import org.springframework.cloud.client.discovery.EnableDiscoveryClient; 11 import org.springframework.cloud.netflix.zuul.EnableZuulProxy; 12 import org.springframework.cloud.openfeign.EnableFeignClients; 13 import org.springframework.context.annotation.Bean; 14 15 @EnableFeignClients 16 @EnableCircuitBreaker 17 @EnableDiscoveryClient 18 @EnableZuulProxy 19 @SpringBootApplication 20 public class EhealthApiGatewayApplication { 21 22 public static void main(String[] args) { 23 SpringApplication.run(EhealthApiGatewayApplication.class, args); 24 } 25 26 @Bean 27 public PreFilter preFilter() { 28 return new PreFilter(); 29 } 30 @Bean 31 public PostFilter postFilter() { 32 return new PostFilter(); 33 } 34 @Bean 35 public ErrorFilter errorFilter() { 36 return new ErrorFilter(); 37 } 38 @Bean 39 public RouteFilter routeFilter() { 40 return new RouteFilter(); 41 } 42 } AsEhealthApiGatewayApplication.java class annotation has been already discussed, see above if it is not clear. Spring Security and Oauth2 Implementation in Microservices Architecture I have done OAuth2 implementation based on Spring Security. Step 4: Create a Project "patient-management-service" Patient-related data will be manipulated. POM.xml XML xxxxxxxxxx 1 114 1 <?xml version="1.0" encoding="UTF-8"?> 2 <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 3 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"> 4 <modelVersion>4.0.0</modelVersion> 5 <parent> 6 <groupId>org.springframework.boot</groupId> 7 <artifactId>spring-boot-starter-parent</artifactId> 8 <version>2.3.1.RELEASE</version> 9 <relativePath/> <!-- lookup parent from repository --> 10 </parent> 11 <groupId>com.amran.patient.management</groupId> 12 <artifactId>patient-management-service</artifactId> 13 <version>0.0.1-SNAPSHOT</version> 14 <name>patient-management-service</name> 15 <description>patient-management-service project for Spring Boot</description> 16 17 <properties> 18 <java.version>1.8</java.version> 19 <spring-cloud.version>Hoxton.SR5</spring-cloud.version> 20 </properties> 21 22 <dependencies> 23 <dependency> 24 <groupId>org.springframework.boot</groupId> 25 <artifactId>spring-boot-starter-actuator</artifactId> 26 </dependency> 27 <dependency> 28 <groupId>org.springframework.boot</groupId> 29 <artifactId>spring-boot-starter-data-jpa</artifactId> 30 </dependency> 31 <dependency> 32 <groupId>org.springframework.cloud</groupId> 33 <artifactId>spring-cloud-starter-netflix-eureka-client</artifactId> 34 </dependency> 35 <dependency> 36 <groupId>org.springframework.boot</groupId> 37 <artifactId>spring-boot-starter-web</artifactId> 38 </dependency> 39 40 <dependency> 41 <groupId>org.springframework.cloud</groupId> 42 <artifactId>spring-cloud-config-client</artifactId> 43 </dependency> 44 45 <dependency> 46 <groupId>org.springframework.boot</groupId> 47 <artifactId>spring-boot-starter-security</artifactId> 48 </dependency> 49 <dependency> 50 <groupId>org.springframework.security.oauth</groupId> 51 <artifactId>spring-security-oauth2</artifactId> 52 <version>2.5.0.RELEASE</version> 53 <scope>compile</scope> 54 </dependency> 55 <dependency> 56 <groupId>org.postgresql</groupId> 57 <artifactId>postgresql</artifactId> 58 <scope>runtime</scope> 59 </dependency> 60 <dependency> 61 <groupId>org.springframework.boot</groupId> 62 <artifactId>spring-boot-configuration-processor</artifactId> 63 <optional>true</optional> 64 </dependency> 65 <dependency> 66 <groupId>org.projectlombok</groupId> 67 <artifactId>lombok</artifactId> 68 <optional>true</optional> 69 </dependency> 70 <dependency> 71 <groupId>org.springframework.boot</groupId> 72 <artifactId>spring-boot-starter-test</artifactId> 73 <scope>test</scope> 74 <exclusions> 75 <exclusion> 76 <groupId>org.junit.vintage</groupId> 77 <artifactId>junit-vintage-engine</artifactId> 78 </exclusion> 79 </exclusions> 80 </dependency> 81 </dependencies> 82 83 <dependencyManagement> 84 <dependencies> 85 <dependency> 86 <groupId>org.springframework.cloud</groupId> 87 <artifactId>spring-cloud-dependencies</artifactId> 88 <version>${spring-cloud.version}</version> 89 <type>pom</type> 90 <scope>import</scope> 91 </dependency> 92 </dependencies> 93 </dependencyManagement> 94 95 <build> 96 <finalName>${project.artifactId}</finalName> 97 <plugins> 98 <plugin> 99 <groupId>org.springframework.boot</groupId> 100 <artifactId>spring-boot-maven-plugin</artifactId> 101 </plugin> 102 </plugins> 103 <resources> 104 <resource> 105 <filtering>true</filtering> 106 <directory>src/main/resources</directory> 107 <includes> 108 <include>*.properties</include> 109 </includes> 110 </resource> 111 </resources> 112 </build> 113 114 </project> bootstrap.properties Properties files x 1 server.url = Patient Management Service Working... 2 spring.application.name=patient-management-service 3 spring.profiles.active=development 4 # ip and port of the config server where we can get our central configuration. 5 spring.cloud.config.uri=http://localhost:8888 6 # expose actuator endpoints 7 management.endpoints.web.exposure.include=refresh 8 9 management.security.enabled=false 10 spring.cloud.config.fail-fast=true 11 12 13 ##Security parameter for request verification ## 14 #we consider basic authorization and Token. In auth server verified this token generated by authorization server (Self) based below criteria. 15 client_id=kidclient 16 client_credential = kidsecret 17 check_authorization_url = http://localhost:8080/oauth/check_token 18 resources_id = ehealth PatientManagementServiceApplication.java Class Example Java xxxxxxxxxx 1 14 1 package com.amran.patient.management; 2 3 import org.springframework.boot.SpringApplication; 4 import org.springframework.boot.autoconfigure.SpringBootApplication; 5 import org.springframework.cloud.client.discovery.EnableDiscoveryClient; 6 7 @EnableDiscoveryClient 8 @SpringBootApplication 9 public class PatientManagementServiceApplication { 10 11 public static void main(String[] args) { 12 SpringApplication.run(PatientManagementServiceApplication.class, args); 13 } 14 } You must annotate @EnableDiscoveryClient in the class so the Eureka server will be discovered as a service or client. Security OAuth2-Client: ResourceServerConfig.java, WebSecurityConfig.Java Class Example You need a WebSecurityConfigurerAdapter to secure the /authorize endpoint and to provide a way for users to authenticate. A Spring Boot application would do that for you (by adding its own WebSecurityConfigurerAdapter with HTTP basic auth). It creates a filter chain with order=0 by default and protects all resources unless you provide a request marcher. The @EnableResourceServer does something similar, but the filter chain it adds is at order=3 by default. WebSecurityConfigurerAdapter has an @Order(100) annotation. So first the ResourceServer will be checked (authentication), and then your checks in your extension of WebSecurityConfigureAdapter will be checked. Java xxxxxxxxxx 1 38 1 package com.amran.patient.management.security; 2 3 import org.springframework.beans.factory.annotation.Value; 4 import org.springframework.context.annotation.Configuration; 5 import org.springframework.http.HttpMethod; 6 import org.springframework.security.config.annotation.web.builders.HttpSecurity; 7 import org.springframework.security.oauth2.config.annotation.web.configuration.EnableResourceServer; 8 import org.springframework.security.oauth2.config.annotation.web.configuration.ResourceServerConfigurerAdapter; 9 import org.springframework.security.oauth2.config.annotation.web.configurers.ResourceServerSecurityConfigurer; 10 11 /** 12 * @Author : Amran Hosssain on 6/27/2020 13 */ 14 @Configuration 15 @EnableResourceServer 16 public class ResourceServerConfig extends ResourceServerConfigurerAdapter { 17 18 @Value("${resources_id}") 19 private String resourceId; 20 21 @Override 22 public void configure(HttpSecurity http) throws Exception { 23 http 24 .headers().frameOptions().disable() 25 .and() 26 .csrf().disable() 27 .authorizeRequests() 28 .antMatchers("/eureka/**").permitAll() 29 .anyRequest() 30 .authenticated(); 31 } 32 33 @Override 34 public void configure(ResourceServerSecurityConfigurer resources) throws Exception { 35 resources.resourceId(resourceId); 36 } 37 38 } Java xxxxxxxxxx 1 44 1 package com.amran.patient.management.security; 2 3 import org.springframework.beans.factory.annotation.Value; 4 import org.springframework.context.annotation.Bean; 5 import org.springframework.context.annotation.Configuration; 6 import org.springframework.security.authentication.AuthenticationManager; 7 import org.springframework.security.config.annotation.web.configuration.EnableWebSecurity; 8 import org.springframework.security.config.annotation.web.configuration.WebSecurityConfigurerAdapter; 9 import org.springframework.security.oauth2.provider.authentication.OAuth2AuthenticationManager; 10 import org.springframework.security.oauth2.provider.token.RemoteTokenServices; 11 import org.springframework.security.oauth2.provider.token.ResourceServerTokenServices; 12 13 /** 14 * @Author : Amran Hosssain on 6/27/2020 15 */ 16 @Configuration 17 @EnableWebSecurity 18 public class WebSecurityConfig extends WebSecurityConfigurerAdapter { 19 20 @Value("${client_id}") 21 private String clientId; 22 23 @Value("${client_credential}") 24 private String clientSecret; 25 26 @Value("${check_authorization_url}") 27 private String checkAuthUrl; 28 29 @Bean 30 public ResourceServerTokenServices tokenServices() { 31 RemoteTokenServices tokenServices = new RemoteTokenServices(); 32 tokenServices.setClientId(clientId); 33 tokenServices.setClientSecret(clientSecret); 34 tokenServices.setCheckTokenEndpointUrl(checkAuthUrl); 35 return tokenServices; 36 } 37 38 @Override 39 public AuthenticationManager authenticationManagerBean() throws Exception { 40 OAuth2AuthenticationManager authenticationManager = new OAuth2AuthenticationManager(); 41 authenticationManager.setTokenServices(tokenServices()); 42 return authenticationManager; 43 } 44 } Check How It Works 1. Generate Token Project run sequence: CentralConfigServer->DiscoveryServer->API Gateway Server-> Others Service 2. Client Details In Database 3. User Record 4. Generate Token 5. Call Patient Management Service (Zuul Dynamic Routing) 6. Direct Call Patient Service (Token Verify From Auth Server) Note: Without a token, you can't call. 7. Call Clinic Management Service (Zuul Dynamic Routing) Conclusion I am trying to show OAuth2 implementation in microservice architecture with secure communication, single entry point, dynamic routing, fail-back solutions, centralized configurations, and OAuth2-client implementation in service to secure every API and every request to ensure authorization. Source Code The full source code can be found here.
Amol Gote
Solution Architect,
Innova Solutions (Client - iCreditWorks Start Up)
Ray Elenteny
Solution Architect,
SOLTECH
Nicolas Duminil
Silver Software Architect,
Simplex Software
Satrajit Basu
Chief Architect,
TCG Digital