Performance Engineering in the Age of Agile and DevOps

Performance has to be closer than ever for companies that wish to release faster than ever.

aftab alam

Rohini Sathaye

Divakar Prabhu

Jan. 16, 19 · Analysis

Likes (6)

Comment

Save

11.7K Views

Many organizations are moving towards an Agile methodology for reducing time to market, for a better quality of the deliverables, and for aligning business goals with technology. Agile adoption help business groups provide feedbacks while the product is being built, unlike the waterfall methodology. As the software is released in an iterative manner with shorter release cycles, this often leads to shorter performance testing windows. This poses challenges to conventional performance testing and engineering methodology adopted in the traditional waterfall model.

This article talks about different aspects of performance testing and engineering in an Agile environment and innovative ways to mitigate various challenges

Performance Testing in Waterfall vs. Agile SDLC

In the traditional waterfall model, performance testing happens towards the end, just before user acceptance testing (UAT) or in parallel with system integration testing (SIT). This often results in applications not meeting the performance requirements. Also, performance issues are mitigated by expensive hardware sizing methods rather than code optimization. This sometimes results in production issues with a direct impact on end user’s experience and business.

In the case of the Agile model, the functionality is delivered in an incremental manner. As the release cycles are short, most of the activities (requirements gathering, development, and testing) are carried out in parallel. Performance testing of the release/sprints needs to be completed in a very short time frame (1 to 2 weeks depending on release cycle).

Performance assurance is an integral part of the SDLC. When it comes to Agile, it takes higher priority than other non-functional requirements. Performance assessment is done at every stage in an incremental way. PT&E becomes a part of everyone jobs, be they program manager, development manager, developers, SDET or network operations team. The following diagram indicates PT&E activities in Agile lifecycle.

Performance Engineering activities

Performance Engineering Activities in Agile

Performance Testing and Engineering Toolsets in Agile World

Automation at every stage being the key, as it is essential to have a toolset that facilitates quick integrations with each other. Let us have a quick look at the widely used tools from PT&E perspective:

Application Lifecycle Management	Source code Management	Build and Continuous Integration	Code Analysis	Load Simulation	Performance Monitoring and KPI trend
JIRA HP-ALM	Git TFS	Maven Jenkins Bamboo Puppet Rundeck GO Ant	Sonar	JMeter HP Loadrunner SOASTA Neoload Gatling Artillery.io Blazemeter	DynaTrace Appdynamics NewRelic Instana Splunk Datadog

PT&E Challenges

The key challenges faced in performance engineering exercise in Agile SDLC can be attributed to people, processes, and technology. The following table indicates the key challenges:

Category	Description
Technology	Full-scale environment is not available for performance testing
Process	Limited time available for environment setup, load test script and data setup to execute full-scale load test
Process	Unstable builds: Since development and testing activities go simultaneously break-off time is needed within a Sprint. If major changes occur after the break-off time, then those changes will need to be tested for the next sprint.
Process	Limited or no support for developers to fix broken environments or help in understanding how a feature works
Process	Very little documentation to understand feature design
People	Challenges in identifying talents with the right skill set

Let us understand how we can overcome these challenges by deploying a framework consisting of features which facilitate effective performance risk analysis, early performance testing, and apprpriate usage of technology and tools. Key strategic elements of this framework are:

Key Elements of Frameworks

Service Virtualization Usage in PT&PE

Service virtualization can help speed up performance testing activities comprehensively by using virtual services by emulating the behavior of components even before they are available or developed. Service virtualization helps fill in the gaps of missing system components by simulating the responses in a pre-determined manner and facilitates early performance validation. The ost popular tools are CA LISA and TOSCA.

Service virtualization can be used to emulate the third-party services, even whole backend systems for Performance and Integration tests

SV usages in PE

ML Lead Performance Validation

While ML is next-generation automation, Agile enables faster release cycles. Thus Agile and AI are coherent technologies. It is possible to detect anomalies, draw patterns from sprint level test results, and find defects using ML techniques. ML can also be leveraged for post-production analysis. It can also be leveraged for trend analysis across sprints. For example, the clustering model can be used to cluster VMs or container CPU usages and see if all VMs/containers of an application are utilized at the same level. It can help is separating abnormal VMs/Container that needs to be analyzed further. Similarly, a predictive model can be used to develop a correlation between resource usage and response time. This model will give predicted value for a given load. This predicted value can be compared with actual value from the test and call out any abnormally as part of the automated analysis process.

This same technique can be used in even production for live monitoring and alerting. Today we have many AI/ML powered APM tools like Dynatrace, Instana, Newrelic, AppDynamics that can be used as well.

Continuous Performance Testing Strategy

It has been observed that 70-80% of performance defects actually do not require a full-scale performance testing in a performance lab. These defects can be found using a signal user or small scale load test in the lower environment if a good APM tool is used to profile and record execution statistics at each stage of SDLC. As Agile aims at progressive elaboration, continuous performance validation approach needs to be deployed to progressively expand the test coverage through multiple iterations. Continuous performance testing approach comprised of 3 distinct phases:

Shift Left enables teams to do the right level of load testing in the build development phase. The target of this shift left approach is to find and fix all code related performance defects as development is in process.

Make sure all feature shave performance success criteria along function requirement
Make sure basic performance best practices are followed, like:
- Timeout setting should be implemented at each interaction level.
  - API/service calls
  - DB calls
- Unit performance test of each component should be part of “task completion” definition.
Use a good APM tool like Dynatrace, AppDynamics, NewRelic, etc. in all environments.
Analyze performance data and trend the same build by build automatically from unit and automated system test [behind the scene performance analysis].
Create a self-service job to execute performance test at component level/API level on demand and as part of build pipeline.
Build success should include performance threshold as well. These should be lower of insolation or component level test. These jobs can be used by developers to quickly gain performance insight without waiting for the full-scale load by perf engineering.
Include performance monitoring or telemetry change for operations in the development phase
PT Phase (Performance Testing in performance environment): the performance environment should be used for performance evaluation of changes that need full-scale performance testing
Infrastructure upgrade
- New components that need scalability and capacity assessment before production release
- Changes in technologies stack, like moving a search engine from SOLR to Elastic Search
- Database migration

Holiday- or peak season-specific feature or holiday readiness performance validation.

These tests in performance environment are same as the traditional performance test exercise. However, activities need to be automated. This can be done with tools like Jenkins or GO pipeline. Performance analysis, data trend, abnormally detection, and cutting down performance analysis by more than 50% if we use machine learning algorithm.

Shift right enables performance testing in production. In Agile methodology, since full-scale performance test is not done all the time in a performance lab, production performance testing is highly critical to make sure the system can handle user traffic without any issues. It also helps in verifying how an application will scale and how much infrastructure is needed for peak load. High availability and failover test cases can be directly executed in production.

There are a few points that need to be considered for implementing production performance testing process:

An application must be designed to differentiate between test traffic and real user traffic based on cookies/header value. In the case of test traffic, the application can invoke virtualized/stub endpoints to make sure test is not creating data in the backend system.
Newer testing methodologies such A/B testing methodologies must be evaluated for gauging performance
- New features/changes are only visible to a certain set of users. The percentage of the user set is increased gradually if there are no issues
- If there are issues found, either we do not increase the load further or new features will be pulled back.

Tests must be designed to mimic real user patterns in term of
- User session pattern
- Data
- Geolocation
- Mobile vs browser vs mobile app traffic vs IoT traffic (e.g. Amazon alexa/googlehome or location specific push/pull traffic)
The right load test tool must be selected that supports
- Dynamic scaling of load
- Can generate load from various location and platform
Some of the security rules must be updated whitelist load generator IPs to allow load on limit IP addresses.
Collaboration with operations and external team is another essential aspect
Live Monitoring
- Right tools, Alerts, dashboards should be set up to monitor and analysis test and its impact on real user traffic live.
- An automated framework using a machine learning algorithm to monitor patterns and detect abnormally, compare the data set with patterns, and use the predictive model to predict performance as the load is increased dynamically will greatly reduce test analysis time and also reduce the chance of impacting real users.
- Stop the test if real user experience degrades below the predefined threshold.
Post-test activities
- Data clean up if any
- Test impact analysis and recommendation
- Communication to all stakeholders about test finding and next steps.

The benefits of proactive and continuous performance testing & engineering using shift left and shift right approach include:

Identification of performance issues/bottlenecks in early phases, which helps in fast resolution with less cost. This is a proactive approach instead of a reactive approach.
Awareness of application performance status along with functionality even before actual performance testing in a dedicated performance environment.
It is easy to identify the root cause of the performance issue
It is possible to fix performance issues at the code level during the development stage
The development team will get adequate time to fix performance issues and optimize code
Can monitor the progress of work at any time by looking into product and sprint backlog
Software delivered will be of better quality and also in a relatively shorter duration.

PT & PE Team Involvement

Performance testing has higher precedence when it comes to Agile and the goal should be to test performance early and often. So the PT&E team needs to be involved right from the inception. The following table indicates a typical RACI matrix in any Agile implementation.

Key Activity	Business Owner (s)	Program / Portfolio Manager	Sponsor	PT&E Team
Identify all the stakeholders	C	A, R	S	C
Collect inputs based on business needs, market drivers and feedback from end users	C	A, R		C
Performance Engineering Best Practices, Guidelines, Goal, approach/strategy	C	C		A, R
Assess Current Infrastructure capacity and performance	C	C		A, R
Review and update existing workload models or create new as needed	C	C		A, R
Performance prototyping and tool POCs	C	C		A, R
Define performance KPIs at the feature level	C	C		A, R
Active review of intermediate design	C	A, R		C
Code review from performance POV	C	A, R		C
Automate performance Test Activities	C	C		A, R
Performance test framework integration with the build server.	C	C		A, R
Component performance testing	C	C		A, R
Code profiling and debugging	C	C		A, R
Performance Test Design	C	C		A, R
Performance Test Execution	C	C		A, R
Performance Monitoring changes - - requirement for telemetry - dashboard updated - defining incident and alert for performance KPI	C	C		A, R
Production performance analysis and finding potential issues/hot spots	C	C		A, R
Planning to execute performance test in production if applicable	C	C		A, R

Extensive Usage of APM tools

The success of any Agile implementation is dependent on the right choice of the toolset, APM tools being the key ones. There are multiple ways in which APM tools can solve Agile challenges:

Isolate and fix performance issues in coding phase at sprint level
Analyze performance trends before and after code deployment at sprint level
Continuous synthetic monitoring
Increase online collaboration between dev and test teams. Need for detailed triage meeting in WAR rooms is minimized by tracing the sessions, pinpointing exact issues
Consistent metrics to analyze performance issues across the environment and it is accessible to development, testing teams, business users all the time.

The following diagram depicts how APM tools help in every Agile lifecycle stage:

Image title

Performance Risk Assessment

User story performance risk assessment not only helps identify key user stories which should be performance tested but also helps decide which user stories can be given lower priority. The following table indicates the parameters that decide the probability and impact of failure. The higher the risk factor, the higher the priority:

Probability/Impact	Parameter	User Story # 1	User Story # 2
Factor#1	Degree of change from the previous sprint	1	1
Factor#2	Stringent Performance SLA	5	4
Factor#3	No. of components impacted	4	4
Factor#4	Any Regulatory requirement	3	3
Factor#5	Heavy customizations and complex business logic	5	4
Factor#6	Technical Complexity	5	5
Weighted Probability	Probability	5	4
Factor#1	Is this a Strategic change?	1	2
Factor#2	Business Importance and criticality	5	2
Factor#3	No. of Users impacted	4	3
Factor#4	Public Visibility and impact on the brand	4	3
Weighted Impact	Impact of Failure	4	2
Risk Factor	Risk Factor	20	8

Right Skillset

In Agile manifesto, performance engineers need to be a good developers with data science tools expertise to apply ML and automate tasks, analyze, and derive useful insights from data.

Skillset	Details
Ability to view larger/bigger picture	Performance engineer needs to have the ability to understand business requirements, priorities, and impact
Have strong technical and business acumen	Performance engineer should be able to translate business requirements into technical features and understand the business benefits that improved performance can deliver e.g. customer retention, increased revenue etc.
Coding experience	Performance engineer needs to have development experience with good understanding of architecture, infrastructure components
Continuous improvement mindset	As the key objective of every release is to have better performance, performance engineer needs to look for continuous improvement opportunities from application performance and overall process perspectives

Conclusion

Continuous performance assurance to improve application quality with reduced cycle times is essential in Agile environments. This can be achieved by embedding APM tools leveraging next-generation analytics technologies, service virtualization tools, testing early and often, continuous intelligent automation, and deploying robust gating criteria for each lifecycle phase.

agile Machine learning Engineering Test design Sprint (software development) R (programming language) Service virtualization mobile app Data science Continuous Integration/Deployment

Opinions expressed by DZone contributors are their own.

Related

Trending