Utilizing Google Cloud to Enable the Future of Intelligent Software Testing
Here are five of the most significant ways Google has helped our team deliver and scale an enterprise-grade test automation solution in the cloud.
Join the DZone community and get the full member experience.
Join For FreeA few years ago, our team set out to solve a growing problem for today’s agile development teams: web application testing. We focused on QA automation because it is the most acute pain point for many teams hoping to realize their full DevOps potential. While there are dozens of solutions for testing application quality, most were built for a different era of installed, on-premise software that changed infrequently. In a cloud-native world that expects rapid development cycles, it was clear that a new solution was needed.
Our goal was to create a test automation platform that would fit seamlessly into today’s fast-paced Quality Engineering (QE) landscape - a concept that urges teams to create a culture of quality where the silos kept QA and DevOps at odds no longer exist. We also wanted to simplify the test automation problem through the use of machine intelligence. We knew that modern QE teams wanted to avoid the burden of installing, configuring, managing, and scaling a testing solution, so we developed our platform to be a cloud-native SaaS.
To do this, we had to evaluate several cloud providers. We ultimately chose Google Cloud (GCP) as our infrastructure provider based on the strength of its serverless compute, data analytics, and machine learning offerings.
Here are five of the most significant ways Google has helped our team deliver and scale an enterprise-grade test automation solution in the cloud.
1. Configuration Management
From day one, our team took advantage of several of GCP’s reliable and fully managed services. For example, we run tests on GKE, APIs on App Engine, process data in DataFlow, and more, without the added stress of server configs, patches, performance, and scalability. Using these Google services has allowed our team to keep operational overhead to a minimum, leaving more developer cycles for feature work.
2. System Operations
We use Google Cloud Monitoring for rich real-time metrics that help us to understand customer usage and test performance. When we encountered our very first incident, we had immediate access to the tools (Cloud Monitoring and Cloud Logging) we needed to quickly diagnose and resolve the issue. We have since instrumented GKE services with Cloud Monitoring custom metrics to build powerful dashboards that provide a deeper understanding of overall platform performance in real-time and include aggregated data across hundreds of thousands of time series to determine the overall health of our platform’s testing subsystems.
3. Low Latency Network
For every networking need that our team has encountered, there has been a solution offered by GCP - from Cloud DNS for domain hosting to Cloud VPN for hybrid connectivity. Many software startups focus first on building their minimum viable product (MVP), then focus on making it work at scale. At our organization, since the MVP was natively built on GCP, it simply scaled along with our business. For example, the core testing engine is built on Google Kubernetes Engine (GKE). It has successfully scaled from the initial few testing cores to the thousands of cores it uses today without modification.
Our customers are generating terabytes of test output. Our test automation platform writes this data to Google Cloud Storage (GCS), and the test artifact processing systems scale without modification to handle billions of test output files. Additionally, serverless GCP services like Google Cloud Functions and Cloud Pub/Sub scale automatically to help analyze those outputs. The platform can then process millions of work units at an economical rate, without any developer or operations intervention, simply because the architecture was built on highly scalable GCP services from the start.
4. Scalable Data Processing
A differentiator of our intelligent test automation platform is the rich diagnostic data available for every test run. Users can dig into every step of a failed test to identify and fix the root cause quickly and easily. In order to provide this level of detail, we built the platform using Dataflow as an efficient way to continuously process a variety of data streams generated by tests, provide rapid access to distilled results, and enable a retrospective analysis to drive future product decisions. It incorporates auto-scaling to efficiently handle bursty workloads when many large test suites are being executed simultaneously.
We have also integrated BigQuery into the platform to capture and analyze data from both test executions and user activity in the front end. It serves as a data warehouse where we can rapidly iterate on analyses and have a comprehensive picture of the platform from terabytes of raw data. Internally, we use the information to help guide the business and make sound data-driven decisions ranging from how to improve core test execution capabilities to monitoring new feature adoption. Externally, our customers can integrate their workspace with BigQuery for a more detailed analysis of their individual test runs and test coverage.
5. Enterprise-Grade Security
The final area, and arguably the most important, is security. It has been our practice to ensure the protection of all customer data at every phase of our evolution. While security can introduce complexity and overhead to software development, we have benefitted from being a modern, cloud-native company, building on the bedrock of Google and GCP’s two decades of cloud best practices.
Google Cloud Storage and Cloud IAM are the core components that keep millions of customer files organized, secure, and manageable. For example, all customer test inputs and outputs are stored per customer Cloud Storage buckets. Using per customer Cloud KMS encryption keys, each customer bucket is independently encrypted for added security-at-rest.
Architecting with separate buckets has allowed simplified access control, auditing, and data cleanup. For example, if a GDPR request requires expunging customer data, removing a single KMS key will immediately and irrecoverably crypto-shred the bucket and its millions of files. For comparison, in legacy cloud operations, complex database queries followed by millions of delete operations would be needed to identify, isolate, and remove all relevant files, which can now be done in a single API call on GCP.
Additionally, per customer buckets simplify access control decisions for both user file access and workload file access by eliminating the question of who owns what files. Using Cloud IAM Conditions and Workload Identity, mabl services can run under very tightly constrained Service Accounts, allowing adherence to the Principle of Least Privilege to access the minimum relevant customer Cloud Storage data needed for that service and customer.
It has been five years since we started building mabl, and we have assembled enterprise-grade services from GCP into a scalable core architecture with less time and money than building it ourselves. Being cloud-native allowed us the luxury to get out of PoC and beta in record time. This approach has enabled us to invest most of our human and financial capital into the innovation and user experience that differentiate our test automation platform.
The Google Cloud Platform provides the solutions that enabled our team to assemble a world-class infrastructure.
Published at DZone with permission of Joe Lust. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments