Cloud Cost Optimization: New Strategies for the AI Era
Explore new strategies for cloud cost optimization in the AI era, featuring insights from Stacklet CTO Kapil Thangavelu on FinOps, open source, and automation.
Join the DZone community and get the full member experience.
Join For FreeIn today's volatile economic landscape, enterprises are scrutinizing their cloud bills more than ever. Platform teams are at the forefront of this challenge, tasked with finding innovative ways to optimize usage and drive down costs. To gain insights into this evolving field, we spoke with Kapil Thangavelu, co-founder and CTO of Stacklet and the creator and lead maintainer of Cloud Custodian. Let's dive into his perspectives on the latest trends in cloud cost optimization.
The Changing Landscape of Cloud Costs
Q: What's different about the cloud cost outlook today compared to recent years, from your point of view?
A: Thangavelu identifies several fundamental changes that have complicated and often increased cloud costs in recent years:
- Increased complexity: As organizations scale in the cloud-native era, different application teams leverage various cloud services, making usage increasingly complex.
- Hybrid and multi-cloud approaches: Many organizations are adopting these strategies, which make managing resources in disparate areas challenging and integrate cloud services with existing on-premises systems.
- Rise of AI applications: These applications fundamentally rely on cloud infrastructure and are incredibly resource-intensive. They often require high-performance GPUs, which are more expensive than standard CPU instances. Additionally, the volume of data needed by AI applications drives up processing and storage costs.
- Pressure for efficiency: Business leaders are under increased pressure to enhance efficiency, reduce waste, and gain better insights into cloud usage.
- Overprovisioning: This remains a significant contributor to runaway costs, but billing opacity makes it challenging to trace costs back to specific resources, especially as deployments grow and AI is involved.
- Shift in focus: While organizations previously rushed to adopt the latest features from cloud providers, there's now a greater emphasis on improved guardrails and best practices to eliminate and prevent cloud waste.
The Challenge of Cost Correlation in Modern Architectures
Q: Why is it so challenging to correlate cloud costs in today's modern cloud-native systems and application architectures? Where do teams need help with the mandate to lower cloud costs?
A: Thangavelu highlights several factors that make cost correlation challenging:
- Growing service complexity: The sheer number of applications and services available in the cloud is constantly increasing, making optimization and control of usage difficult, especially at scale across multiple engineering groups.
- Ephemeral components: Cloud-native systems often use ephemeral components and dynamically scale microservices, making it hard to track costs and attribute them to specific resources or services over time.
- Distributed systems: When organizations run distributed systems with interconnected microservices, understanding the cost implications of individual components becomes exceptionally challenging.
- Use optimization: While many organizations have some form of rate-based optimization, they need help with usage optimization. Effectively taking action requires better insights into services and deeper engagement with engineering teams.
- Lack of early implementation: If an organization doesn't implement cost management tools and processes from the start, it becomes increasingly challenging to understand spending patterns as the system grows.
- Real-time visibility: The goal is to provide real-time, holistic visibility into a company's cloud platforms, resources, and configurations to optimize spending, but achieving this is a significant challenge.
Open Source Solutions and Trends
Q: What do you see in the open source community in terms of encouraging trends/patterns/new technology approaches that are bringing this cloud cost equation into better control?
A: Thangavelu sees several positive developments in the open-source community:
- Policy standards: Projects like Cloud Custodian and FOCUS provide unified frameworks for managing cloud costs across providers.
- Automation: Open source tools are emerging that can automate cloud usage and controls across the infrastructure lifecycle to eliminate waste and enable good "cost hygiene."
- Community-driven innovation: Thriving open source communities, like Cloud Custodian, with over 450 active contributors, can provide more advanced cost management solutions faster than individual organizations.
- Cross-provider support: Tools like Cloud Custodian support all major cloud providers, allowing organizations to implement consistent governance across different environments.
- Real-time enforcement: Cloud Custodian, for example, allows users to define policies that can be automatically enforced in real-time across various cloud resources.
The FinOps Movement and Cross-Team Communication
Q: What is your general sense of the FinOps movement and how well or poorly are "finance" and "engineers" communicating today? What's broken? What needs to be improved?
A: Thangavelu notes both progress and ongoing challenges:
- Traditional silos: There's often still a disconnect between engineering teams using and provisioning cloud resources and finance teams controlling budgets.
- Lack of understanding: Engineers might need to grasp the cost implications of their decisions immediately, while finance needs insight into the reasons for these technical choices.
- Conflicting incentives: Engineers often prioritize innovation and time-to-market over financial prudence.
- Positive developments: Organizations like the FinOps Foundation have gained prominence and done an excellent job educating and driving better collaboration between various groups.
- Room for improvement: There's still a need for better adoption of governance and automation at scale, particularly in cost governance for cloud usage, including contextual information and automated remediation workflows.
- Behavior change: Improved governance and automation can drive action and reinforce cost-aware behavior among engineering teams.
Stacklet's Approach to Cloud Cost Optimization
Q: How does Stacklet fit into this trend of cloud costs and FinOps? What's new and different via Stacklet regarding cost savings, utilization, fewer moving parts, and not burning money on idle resources?
A: Thangavelu outlines Stacklet's approach to addressing cloud cost challenges:
- Focus on usage optimization: While many teams start with rate optimization, Stacklet emphasizes usage optimization, which requires close collaboration with internal engineering teams to align cloud resources with business needs.
- Addressing common challenges: Stacklet aims to tackle issues such as fragmented visibility, manual processes, misaligned organizational goals, and the need for timely engineering actions.
- Comprehensive visibility: The platform provides an inventory of all cloud resources and configurations in real-time, combining this with policy execution data for an accurate, contextualized view of cloud infrastructure management.
- Best practice policies: Stacklet offers pre-defined policies to address common security, operations, and cost optimization use cases.
- Developer-centric approach: The platform focuses on the developer experience, integrating with existing workflows and collaboration tools to reduce the burden of change management.
- Automated workflows: Engineer-centric workflows make eliminating waste faster, allowing teams to focus on innovation.
- Intelligent communication: The platform automatically groups related notifications and routes them to the right stakeholders.
- Preventing recurrence: Stacklet's automated guardrails aim to prevent waste from recurring.
Advice for Enterprises
Q: What kind of advice would you give enterprises that feel the cloud providers have too much leverage against them in the cost equation? What can the enterprise do to put themselves in a better position?
A: Thangavelu offers several recommendations for enterprises:
- Consider multi-cloud: Adopt a strategy to enhance negotiating power, optimize pricing, and reduce dependency on a single provider.
- Compare offerings: Analyze different cloud platforms to find the most cost-efficient and effective combination of services for specific use cases.
- Workload optimization: Recognize that workloads may better suit different cloud environments based on technical requirements, compliance needs, and performance criteria.
- Implement robust governance: Use tools that enable real-time detection and visualization of policy violations and automatically trigger remediation workflows.
- Leverage automation: Implement solutions that streamline and automate the complex usage optimization and governance process.
Conclusion
As cloud-native architectures and AI applications continue to reshape the technology landscape, cloud cost optimization remains a critical challenge for platform engineers. By embracing open-source solutions, fostering collaboration between finance and engineering teams, and leveraging automation, organizations can navigate this complex terrain more effectively.
The key lies in balancing innovation and financial prudence, ensuring that cloud resources are used efficiently without stifling technological progress. As enterprises continue to scale their cloud operations, tools and strategies that provide real-time visibility, automate policy enforcement, and optimize usage will be crucial in managing costs while driving innovation.
Opinions expressed by DZone contributors are their own.
Comments