What Is a Performance Engineer and How to Become One: Part 1
Considering the current expectations and requirements of many companies and stakeholders, learn some of the skills required for performance engineers.
Join the DZone community and get the full member experience.
Join For FreeNowadays, many performance testers with many years of experience in IT have a lot of confusion and are still confused about the technologies they worked with and were used in their projects for years. A performance engineer is actually a professional performance testing and engineering expert with in-depth knowledge of many load-testing tools like LoadRunner, JMeter, Neoload, Gatling, K6, etc., and must have extensive experience in specialized skills. Most performance engineers have spent years submitting RFPs, developing scripts, executions, analysis, monitoring and tuning, and researching their specific projects/product domains and have gained a very high level of expertise in it. Performance engineers use the mastery of their experience to provide observations and recommendations with appropriate fixes that will improve the performance of private and public companies/stakeholders with many business requirements they work. They help them to resolve issues, blockers, and everything that will help to improve application/system performance to meet SLAs, and challenges and advance business interests. Performance engineers can work in all fields, cutting-edge technologies like Java, Python, IoT, cloud, blockchain, microservices, SAP, AI, Salesforce, etc., and domains that are likely to be more in demand in fast-developing sectors like BFSI, healthcare, e-commerce, insurance, and many more.
Performance testing and engineering have become challenging for any performance guru when they use new technologies every time and all of us with many years of experience will have to spend lots of time. For example, if we are adding a new cloud to the existing architecture or doing performance testing for an SAP application or microservices application with the Go language, working on new technologies for measuring performance on migration and new domains may require you to put in months of formal study, do POCs, acquire advanced/best practices and expert opinions, participate in continuous learning, and gain extensive experience actually to test and fix performance.
To become a performance engineer, one must have in-depth, up-to-date knowledge and experience about how to get things done in any application/product with performance. Consider developing a good understanding of industry best practices and processes, including detailed information about technical methods, practices, and strategies that will help to create performance testing, and performance engineering with application performance management process end to end.
A performance engineer has to be involved in every phase of the performance testing life cycle and performance engineering life cycle with many technical standards. This might include NFR gathering, submitting RFPs, selecting tools, identifying the tech stack that has to be used, architecture design, hardware sizing, capacity planning, workload modeling, selecting the monitoring tools, participating in code reviews, code profiling, and involving software coding standards, or platforms to be used. To be effective, a performance engineer needs broad and deep technical knowledge to make good decisions on the performance front. However, technical knowledge isn’t enough and they must also have the soft skills to manage multiple projects and people simultaneously. There’s no one-size-fits-all definition because different projects may require other technical knowledge, but there are some skills in common that all projects will require.
Considering the current expectations/requirements of many companies and stakeholders, let's highlight some of the skills required for performance testers/engineers/architects/SMEs below.
Learn Basics First
As a beginner in performance testing and engineering, there are several key things that one must know. Performance testing and engineering is a multi-faceted activity, and one must learn all of its aspects from different perspectives. From my personal experience, I started to learn basic skills first because most of the basic skills are easier to learn, and advanced skills are based on the basic concepts. If you want to start learning about performance testing and engineering, I believe you should start with the basics, such as what performance testing is, performance testing types, and its purpose, why we need it, what common mistakes people make while performing it, how to choose a performance testing tool, what tools are commonly used to conduct performance testing, how protocols work, how internal and external components in a system communicate with one another, and so on.
For example, if you don't know how an OS works on Windows and Linux, we will not be able to identify and troubleshoot the OS-related performance issues. In the same way, if we don't know how the Tomcat server works, how to set heap size, what parameters need to be set for better performance, and how to deploy applications on Tomcat, it would become really hard for performance engineers to understand where to look when we get a performance problem on Tomcat application server.
When executing performance tests, it can also be very difficult to understand what is going on if you do not have a solid basic understanding of operating systems, servers, and performance terminology (i.e., what is a hit, throughput, response time, latency, web server, app server, database server, what does a request and response contain), networks, general software architecture, and database concepts, etc. Knowing about software and hardware about how a CPU, disk, and RAM work, the role of CDN, DNS, proxy and reverse proxy servers, load balancers, and API gateways will help performance engineers to quickly identify problems with the help of relevant metrics using various monitoring tools and profiling tools.
In addition, it is also important to study and get a complete understanding of system design concepts and enterprise architecture patterns, know the basic differences between on-prem and cloud, and monolithic and microservices architectures, and how they work can give valuable insights. Depending on the existing architecture, we can create a strategy on how to test the application in different ways for assessing performance under varying load conditions and workloads.
Software Development Experience With Any One Programming Language
Performance engineers must have a good understanding and experience of software development and pick any language with which they are comfortable. The best way is to learn the language in which your application is designed and developed, mostly Java and Python. Almost all performance testing tools available in the market require knowledge of programming languages to create and customize test scripts, analyze results, and automate performance tests as per business requirements. With software development and code profiling tools experience, performance engineers can easily understand the codebase to identify potential performance bottlenecks and work closely with developers to implement permanent performance fixes. Additionally, software development experience helps performance engineers to understand the impact of code changes enabling them to come up with accurate suggestions and recommendations.
Get a Detailed Understanding of Application Design and Architecture
Can we make changes to a system/application without a clear understanding of its underlying architecture? My answer is, "No." What if we already have an application without a clear architecture? Having an application architecture diagram created from an existing application is actually more common than anyone would like to have before the performance testing gets started.
I know many performance engineers and developers might have faced this question: “Explain the architecture of your application.” This may sound overwhelming at first, but you can break down the architecture into different components such as user interface, database, and backend server that runs your business logic, messaging server, API gateways, Load Balancers, caching layer, security layer, etc. Application architecture and design is about how you will go about building your software system.
There are two primary things that you need to decide when working with application architecture: first, how you will break things down into different modules/sub-systems, and second, how these modules/components will talk to each other. The technologies that we use, the tools, methods, languages, components, and frameworks we select, and how you put them together are what make your architecture.
From a performance context, I see application architecture as a flow diagram that defines user flows and transactions. From where the user enters all the way down to the CPU, RAM, disk, and network of the server and components that were actually part of our environment. Many layers can be stacked while developing a web application. For example, a front end and back end are required, and some things you must be able to see and interact with, while others are optional or will be required on a need basis.
Whether it be a monolithic, microservices, cloud, mobile, or event-driven server-less architecture, understanding architecture and application design is a must for any developer and performance engineer to develop stable, secure, robust, and scalable enterprise applications. Your application's behavior and performance depend on the architecture upon which it is built.
Write Technical Specifications/SLAs: Collaboratively Work With Developers/DBAs and Functional QAs
In any performance project across many companies, the SLAs are not straightforward and incomplete. Things can become a lot easier if there is existing user behavior of an application or if you have no existing user behavior, available stakeholders will obviously have a biased view of application usage, then we will start with a baseline and benchmark initially.
To write a performance SLA, we first have to understand its purpose in terms of performance goals. Additionally, we have to make sure the requirements are Specific, Measurable, Achievable, Relevant, and Time-bound (SMART). Whatever has been chosen must be measurable in the real system or prod-like system and utmost care must be taken to ensure that the performance SLAs are clear, concise, and well-defined.
Performance testing teams can make use of the NFR gathering or client questionnaire documents to collect all NFRs and SLAs and utilize a predefined standardized template (if you have one from the customer side) for consistency and traceability. In the majority of the cases, when performance engineers work with a brand new application/product it is simple to say that an application has 10,000 concurrent users with a maximum capacity of 30,000 users, but what exactly does it mean? This is where the business side of things comes in and it means that we need to ask questions to all the teams like what the TPS has to be achieved, the usage pattern, and what percentage of users are doing what actions at a given point of time (for example, search, login, adding to cart or browsing around, etc.). Performance engineers have to effectively collaborate with stakeholders, QAs, developers, DBAs, and business teams to gather inputs and validate functional and performance requirements, which helps in getting key metrics and acceptance criteria to direct development, performance testing, and engineering.
Create POCs as Required With Business Requirements
In performance testing, proof of concept is referred to as a quick performance testing assessment when a client wants to go over all options and check that the performance tests will be beneficial before signing a contract with a company. They can be used to demonstrate whether the performance SLAs are technologically feasible from all directions or not. In general, having a successful proof-of-concept plan, particularly for performance testing, can make or break the performance.
Proof of concept is a vital step in performance testing since it gives many benefits. Giving time to the PoC stage allows you to make better decisions, prioritize features, and optimize the performance testing process, which eventually saves time and money. Normally, a PoC should not take more than two weeks to complete; otherwise, it becomes more than just a proof of concept.
We need to have a team of performance testers working on the proof of concept to check whether the application can support specific performance tools like LoadRunner, JMeter, Neoload, Gatling, etc.
As part of POC, the performance engineers will understand the application architecture, which technologies are used to build the application, what type of application it is, the complexity of the application, and which communication mechanism is used to develop the application. In addition, the POC will also help us to identify which tool is compatible with recording the use cases in the application.
As performance engineers, we will need to take one small, medium, and complex use case flows, develop and enhance test scripts using multiple tools, run the baseline, benchmark, and smoke tests with multi-users for 30 minutes to 1 hour, and once tests are completed, submit test results with pass and fail in a detailed report. Based on the proof of concept, the client will now decide which tool and protocol bundle to buy, and how many labor hours are required to complete performance testing of an application.
Experience With Multiple Performance Testing Tools
Anyone can start by learning any tool (LoadRunner or JMeter, for example) of their choice, as many organizations and projects consider that as an entry criterion to get into performance testing. With any performance tool we use for performance testing (LoadRunner, JMeter, Gatling, or Neoload, etc.), achieving maximum customer satisfaction and user experience is what we should look for to take the business to newer heights of success. I can not overstate the number of tools about which you are expected to learn and have knowledge with hands-on expertise, but with new emerging technologies introduced all the time, you must keep yourself up to date with multiple performance testing tools depending on many factors like budget, timelines, ease of use, flexibility, integrations, protocol support, tech stack, tool support, etc.
Learn performance testing tools from the best instructors and rich YouTube tutorials and give the best chance to yourself to become an expert in performance testing and engineering. Don't worry much about the pros and cons of any tool and in view of the constantly changing requirements of clients and companies: it is essential for you to learn multiple performance testing tools if you wish to excel in performance testing and engineering.
Understand the Business Domain
Domain knowledge enables performance engineers to effectively communicate with stakeholders, developers, stakeholders, database administrators, and other team members. Domain-specific performance engineers can design and develop test scripts that cover all relevant scenarios and guarantee that the application meets the customer's requirements. In simple terms, if you don’t have enough knowledge of the domain you are working in, you will end up focusing on the wrong things (unnecessary performance optimization is one of the most popular choices). Having a diversified experience across several domains can be beneficial and exposure to a variety of technologies, and problem sets may widen a performance engineer's perspective and provide a greater skill set. To be effective in a continuously changing software engineering industry, developers and performance engineers must have a solid understanding of domain knowledge.
Be a Part of Early Performance Testing
Performance testing typically starts after functional testing is completed. It is not just for completed projects and applications that are completely functionally tested: there is value in doing performance testing on individual units or modules that should happen from the early stages of development. Early performance testing is an approach where the tests related to the performance of the application are performed whenever a certain feature, module, API, or microservice of the application is completed, which should happen in parallel with the development.
Early performance testing greatly improves your chances of detecting performance bottlenecks early which don't show up later in production. If the performance engineers don't test the performance at the same time as the code, they will end up with so many performance degradations. The more quickly you identify performance-related issues in your application, the easier and cheaper it will be for you to address and fix them and even small changes to your application can affect the performance of your application. That's exactly what early performance testing can help you with.
Identify Early Mitigation Throughout PTLC: Baseline and Benchmark
Identifying and avoiding performance testing risks and addressing performance issues early in PTLC involves developing realistic and relevant use caseload test scenarios. By creating test scenarios that closely mimic real-world usage conditions, performance engineers can identify potential performance bottlenecks early in the development cycle. These scenarios should represent a variety of user behaviors, traffic patterns, and system loads in order to accurately evaluate how the application works under diverse load conditions.
Performance engineers with baseline and benchmark testing can reduce the risk of unexpected slowdowns or failures in production by proactively identifying and fixing performance issues using realistic load testing scenarios. It is also important to make sure that the test environment and configurations are comparable to real-world conditions. To manage performance testing risks and mitigate issues, we must identify system/application weak points, design realistic performance testing scenarios, monitor everything with APM, analyze and interpret the results, and implement optimizations to ensure a desired application experience. This way it helps in diagnosing the root causes of performance problems, enabling timely interventions and optimizations to ensure the system meets the desired performance standards.
Knowledge and Experience in Multiple Code Profiling Tools
In most of the application code bases, no matter how large they are, there are many places where there is something slow. Experience with numerous profiling tools will assist developers and performance engineers to identify the hotspots especially the functions, methods, and calls that were slow, where the time is spent, why it is spent there, and what can be done about it. The profiling tools help us to identify potential bottlenecks in all the areas of code that are inefficiently using resources and provide metrics on the time taken to execute certain parts of code, as well as the CPU, memory usage, and other system resources. Code profiling must be a mandatory phase in any performance testing and engineering life cycle to identify and eliminate the performance bottlenecks early which don't create big problems later in production.
Be a Part of Hardware Sizing: Create a Performance Model From Scratch
Typically, every company must have a separate dedicated environment for performance testing activities. Creating a performance model from scratch is quite challenging and depends on application and project to project as it involves cost. When we do performance testing, it would need to be done in a specific dedicated environment under controlled conditions. If the performance testing environment keeps changing or is out of control, we can’t guarantee the performance test results.
The hardware sizing methodology used by most clients and organizations is all about sizing the CPU, RAM, DISK, and network bandwidth of the web, app, and DB servers in the performance testing environment. Many performance engineers struggle to build a performance testing environment which is a PROD-like environment that delivers accurate and reliable results.
It can also be very frustrating sometimes to set up a performance testing environment, considering the amount of time and resources, and if not done correctly, it can lead to inaccurate test results which can lead to poor user experience and revenue loss. To build a performance testing environment and conduct performance tests effectively, performance engineers must carefully consider factors such as network speed, server hardware and software, server load, user behavior, infrastructure connectivity, database size, application architecture, and volume of data to avoid the gap between the prod environment and performance testing environments.
Practically, it may not be possible for performance testing environments to resemble with production environment completely due to many reasons. For example, it may not be economically feasible for the company to make performance testing environments similar to production due to limited project cost/budget and other technical constraints. From my experience, I also recommend performance engineers think about a production-like environment and plan how they can replicate it. Can you create realistic users and scenarios? Can you simulate a realistic load over time? Are there pieces (every individual internal and external component in production) that you can replicate, without a full-fledged performance testing in production? It is very essential for the performance testing environment to build a production-like environment as much as possible because the more the performance testing environment is close to production, the more it will show the performance bottlenecks that can occur to live users in production which can later be a huge performance problem.
Develop Robust Performance Scripts
Most of the performance testing errors are from test executions because of unstable VuGen and JMeter scripts which are not debugged completely. Developing robust performance test scripts is the key to all successful performance test executions. With the increased complexity of the applications in distributed environments on-premise or cloud, performance engineers may experience different errors and challenges during test executions, especially if the dynamic values from the servers are not handled properly.
Every application has its own for validation, verification, and authentication. Therefore, it's quite difficult to find scripting issues until performance engineers learn about the technologies involved in your application. For example, issues like extracting dynamic data from web pages, dealing with multi-step authentications, developing scripts for advanced technologies like single page applications, WebSockets, AJAX, and scripting these interactions can be tricky and more complex which can lead to script failures and inaccurate measurements.
Gain Experience With Multiple Monitoring Tools
Performance engineers must learn multiple monitoring tools as they play an important role because they make sure your software applications are running properly, your infrastructure is up, and you won't face any business loss. Monitoring is unfortunately a massive domain, which includes many tool categories and use cases and each specific technology stack has its own monitoring tools. This includes servers (k8s, virtualization), networks, log files, application performance, network performance, and many more categories.
Monitoring tools act as eyes and ears in increasingly complex distributed IT systems, helping organizations optimize the performance of applications, infrastructure, and business services. Monitoring tools are used to constantly monitor and control IT infrastructure, applications, and networks to ensure the availability, performance, and reliability of IT systems as well as the early identification of potential problems or failures that don't become bigger problems in production later. Monitoring systems and components inside your architecture as well as outside both production and pre-production environments, helps you learn, understand, and enhance capacity, resiliency, performance, and scale.
Performance engineers must gain knowledge and experience on multiple monitoring tools especially with gathering metrics, monitoring components, configuring alerts, alarms, and anomalies, understanding the types of monitoring (for example, server monitoring, infrastructure monitoring, network monitoring, etc.), setting up dashboards, developing monitoring strategies, what resources need attention, what is causing a slowdown or outage, organizing and correlating metrics from different sources which will enable performance engineers for early detection of problems and performance degradations that will ensure service providers are able to predict and calculate service and business impact for their systems.
Conclusion
Thanks for reading the article and stay tuned for "What Is a Performance Engineer and How to Become One: Part 2."
Opinions expressed by DZone contributors are their own.
Comments