Platform Engineering Trends in Cloud-Native: Q&A With Jonas Bonér
Cloud abstractions have evolved developer-infrastructure relations and event-driven microservices patterns. Higher abstractions are emerging to manage complexity.
Join the DZone community and get the full member experience.
Join For FreeThe rise of Kubernetes, cloud-native, and microservices spawned major changes in architectures and abstractions that developers use to create modern applications. In this multi-part series, I talk with some of the leading experts across various layers of the stack — from networking infrastructure to application infrastructure and middleware to telemetry data and modern observability concerns--to understand emergent platform engineering patterns that are affecting developer workflow around cloud-native. The next participant in our series is Jonas Bonér, CEO and co-founder of Lightbend and the creator of the Akka event-driven middleware project for building highly concurrent, distributed, and resilient message-driven applications.
Q: We are nearly a decade into containers and Kubernetes (K8s was first released in Sept 2014). How would you characterize how things look different today than ten years ago, especially in terms of the old world of systems engineers and network administrators and a big dividing line between these operations concerns and the developers on the other side of the wall? What do you think are the big changes that DevOps and the evolution of platform engineering and site reliability engineering have ushered in?
A: Over the past decade, we've seen a dramatic shift in the relationship between developers and infrastructure/operations teams. The rise of technologies like containers and Kubernetes has broken down the rigid walls that used to exist between these groups.
Some fundamental changes I've observed:
- Infrastructure is now code: Tools like Terraform and Kubernetes enable infrastructure to be defined and managed as code in a versioned way, just like application code. This brings infrastructure closer to developers.
- Automation everywhere: There is much more emphasis on automation and self-service to reduce toil. Developers can spin up their environments without filing tickets.
- Blurred lines: "You build it, you run it" mentality where developers have more ownership over production environments. Platform teams work closely with developers.
- Focus on observability: With complex distributed systems, understanding behavior is critical, so there is more investment in things like monitoring, logging, and tracing.
- Alignment on productivity: Teams focus on improving developer velocity and productivity rather than throwing issues over the wall.
Overall, it's a much more collaborative environment today between infrastructure and application teams thanks to modern platforms and culture change through DevOps. The days of big divides between these groups are fading.
Q: The popularity of the major cloud service platforms and all of the thrust behind the last ten years of SaaS applications and cloud-native created a ton of new abstractions for the level at which developers are able to interact with underlying cloud and network infrastructure. How has this trend of raising the abstraction for interacting with infrastructure affected developers?
A: The rise of cloud platforms and SaaS has dramatically raised the level of abstraction for developers interacting with infrastructure. This has had both advantages and drawbacks:
Pros:
- Less operational burden: reduced infrastructure management
- More focus on apps: spend time on app logic rather than plumbing
- Improved productivity: faster building and shipping
- Access to services: leverage powerful cloud capabilities
- Skill shift: less ops, more APIs, architecture, design
- Testing flexibility: easy spin up and tear down of environments
- Geo expansion simplified: deploy apps globally more easily
Cons:
- Overwhelming choices: many services to evaluate and choose from
- New complexities: introduced with microservices, distributed systems
- Time spent learning: understanding how to integrate new services
- New skill requirements: Kubernetes, CI/CD, and observability skills needed
- Loss of control: less customization and visibility possible
- Vendor lock-in concerns: reliance on cloud provider APIs
In summary, while abstraction has unlocked developer productivity and velocity improvements, it has also introduced new burdens around selection, learning, control, and skill shifts that need to be managed. There are tradeoffs, but overall, it has created opportunities by enabling more focus on app innovation.
Q: What are the areas where it makes sense for developers to have to really think about underlying systems versus the ones where having a high degree of instrumentation or customization ("shift left") is going to be very important?
A: There are certain areas where it still makes sense for developers to have visibility and control over underlying infrastructure, especially when performance considerations are essential:
- Performance: For high throughput applications that need to handle substantial load and processing, developers may need to select and configure infrastructure components like databases, caching, networking, and compute to ensure optimal performance. Having visibility into resource utilization and tuning is critical.
- Security: Applications dealing with sensitive data may require developers to tightly control infrastructure pieces related to security, like identity management, encryption, VPCs/firewalls, and auditing capabilities.
- Latency: For applications with low latency, like gaming, financial trading, or IoT, the network topology and routing components become very important for developers to optimize.
- Cost: When building price-sensitive applications, developers may need to optimize the underlying compute instances, storage, and databases to control costs.
- Legacy Support: Integration with legacy systems may dictate infrastructure decisions around networking, message queues, or databases.
- Compliance: Highly regulated industries like healthcare may require restricted infrastructure choices to satisfy compliance requirements.
In these situations, developers need to take a more hands-on approach to actively architect, select, and configure the best underlying infrastructure components to meet the application's specific performance, security, latency, compliance, and customization needs.
Q: Despite the obvious immense popularity of cloud-native, much of the world's infrastructure (especially in highly regulated industries) is still running in on-prem datacenters. What does the future hold for all this legacy infrastructure and millions of servers humming along in data centers? What are the implications for managing these mixed cloud-native infrastructures together with legacy data centers over time?
A: There is still a massive amount of legacy infrastructure and workloads running on-premises in data centers, especially in highly regulated industries like healthcare, financial services, and government. Some thoughts on the future of managing this legacy infrastructure alongside modern cloud-native platforms:
- Gradual migration: Rather than rip and replace, we'll see a gradual Hybrid pattern emerge where workloads move incrementally to the cloud when ready. This allows leveraging the cloud while amortizing on-prem investments.
- Focus on abstraction: Platforms that abstract away infrastructure differences will gain popularity to operate more efficiently across on-prem and cloud environments. This provides portability.
- Data gravity: Rather than migrate large datasets, we may process data in place using serverless/container approaches on legacy infrastructure. Move compute to the data.
- Edge computing: Processing data locally on edge devices rather than moving to the cloud can help maintain legacy infrastructure while improving latency.
- Compliance first: Highly regulated industries will focus on compliance/security when adopting the cloud. Legacy infrastructure stays for compliant workloads.
- Cost savings: Reduced data center footprint over time lowers energy, facilities, and hardware refresh costs. But cap-ex to op-ex shift.
- Staff skills transition: IT teams will need training on cloud-native skills. New roles like Site Reliability Engineers emerge to pair infrastructure and application expertise.
Overall, the future will likely involve a hybrid model balancing legacy gear in data centers with the adoption of modern architectures using abstraction layers to streamline operations. However, regulated industries will move cautiously based on security, compliance, and data gravity concerns.
Q: What do you think are some of the modern checklist items that developers care most about in terms of their workflow and how platform engineering makes their lives more productive? Broadly speaking, what are conditions that are most desirable versus least desirable in terms of the built environment and toolchains that modern developers care about?
A: Here are some of the critical items that are most important to developers today in terms of optimizing their workflow and productivity:
Most Desirable:
- Fast build/test cycles: Quick iteration and rapid feedback on changes through practices like CI/CD.
- Easy env provisioning: On-demand access to dev, test, and staging environments without tickets.
- Observability: Logging, monitoring, and tracing to provide insight into apps and dependencies.
- Security scanning: Catch issues early in pipelines. SAST, DAST, infrastructure as code scanning.
- Git ops workflow: Infrastructure and app deployments through git, with review processes.
- Self-service: Ability to carry out tasks without relying on other teams.
- Modular architecture: Components and services with clean interfaces to enable parallel work.
- Automated policies: Guardrails in place to prevent mistakes and enforce standards.
- Collaboration: Platforms that facilitate sharing, transparency, and communication across teams.
Least Desirable:
- Slow feedback cycles: Delays in testing or reviewing changes impede flow.
- Manual provisioning: Creating environments through ticketing and requests.
- Lack of observability: No insight into infra performance, app logs, and failures.
- Security as an afterthought: Finding issues late in the process increases rework.
- Complex UIs: Hard-to-use interfaces that require extensive training.
- Monolithic apps: Tightly coupled code where changes impact other teams.
- Regional silos: Lack of standardization and collaboration across regions.
Overall, developers value self-service, automation, collaboration, and fast feedback to maintain flow and productivity. Legacy and complex platforms lead to friction and delays.
Q: What is a saga pattern, and can you talk about the importance of consistency in distributed applications?
A: The saga pattern is a way of managing data consistency in distributed systems and microservice architectures. It is an approach to achieve eventual consistency across services.
The key ideas behind sagas:
- Each service manages its data consistency according to its domain.
- A long-running business transaction involves calls to multiple services coordinated as a saga.
- If one service call fails, the saga executes compensating transactions to roll back changes and keep each service consistent.
- Rather than transactions using locks for consistency, each service transaction is made idempotent and can be replayed if needed during recovery.
- The system achieves eventual consistency once all compensating actions are complete across services.
Some benefits of the saga approach:
- Works well in distributed architectures vs. distributed transactions.
- Failure in one service doesn't block others.
- Services remain available and responsive during rollbacks.
- Allows for parallel processing across services.
- Maintains autonomy of services managing their own data.
So, sagas are essential for maintaining data consistency across microservices in a distributed system by coordinating multi-service operations or compensating actions in the event of a failure.
Q: What’s tricky for developers working with event-driven microservices, and what do you think is the right level of abstraction for this type of architecture? How has the overall approach to event-driven microservices evolved, and what do you think the future of microservices looks like for developers in general?
A: Developing event-driven microservices architectures poses some unique challenges for developers. Some things to consider are:
Tricky aspects:
- Understanding eventual consistency: Events may take time to propagate, leading to stale reads; this requires a shifting mindset.
- Handling latency: Events are asynchronous, so developers must account for event ordering and processing delays.
- Designing atomic event flows: Careful design is needed to avoid partial event execution.
- Sagas help coordinate rollbacks.
- Idempotent consumers: Event consumers must be idempotent to handle duplicate events during retries.
- Tracing distributed flows: Following a business transaction across services is challenging. Good observability is critical.
Evolving to higher abstractions:
- Frameworks like Kafka Streams provide higher-level abstractions over raw Kafka and make it easier to build stateful event streaming applications correctly.
- Declarative approaches like Dapr's pub/sub help define event flows more readily than custom glue code. This reduces bolt-on eventing.
- Fully managed developer PaaS:es like Kalix provide a high-level abstraction for building complex (or simple) event-driven systems on Kubernetes.
- Over time, event-driven is becoming the primary processing pattern over traditional database CRUD and transactions. This is a significant shift in mindset for developers.
Future trends:
- Event-driven architectures will continue to gain prominence as the complexity of interservice coordination and data consistency challenges drive the adoption of asynchronous event streaming.
- Expect higher-level abstractions that handle technical details, allowing developers to focus more on business logic.
- Tooling will improve around tracing, monitoring, and visualizing event flows across services to enhance observability.
- Integration between events, transactions, and queries will evolve as event streaming becomes the primary processing paradigm.
So, while event-driven microservices introduce new complexities, improved abstractions and observability will enable developers to be more productive in building these distributed applications.
Opinions expressed by DZone contributors are their own.
Comments