Is It Okay To Stop Running Your Tests After the First Failure?
Have you ever suspected that your tests will fail because a dependency became unavailable when you started them? If yes, then this article might be interesting for you!
Join the DZone community and get the full member experience.
Join For FreeRunning tests is the necessary evil. I have never heard two or more engineers at the water cooler talking about the joys of test execution. Don't get me wrong, tests are great; you should definitely have plenty of them in your project. It is just not a pleasant experience to wait for their execution. If they start to fail, that is even worse. Most probably, someone could become very popular if they could reliably tell which tests will fail before executing them. As for the rest of us, I think we should keep running our tests.
Can We Somehow Reduce the Execution Time at Least?
We can at least try! We could start by understanding the problem more and looking at solutions later.
What Can Make Our Test Execution Slow?
When talking about higher-level tests, the usual suspects could be latency and wait times. Latency can be a factor when we are testing a remote system or our system under test uses some (slow) external dependency that is adding to the latency. Wait times are more common when working with UI tests, in my opinion, especially if the tests cannot find out easily when the JS code completed execution. Given the fact that many web applications rely on JS-heavy front ends, this can be quite frequent. A trivial solution can be polling on some element's availability in order to see that JS execution was completed. Using this approach, the happy case is not that problematic because we can set our poll frequency according to our needs. At the same time, failures can become slow because testing for the lack of something is harder since we cannot rule out that it is simply not present yet but will appear in the next millisecond. Due to this, our tests end up defining some timeouts for these types of checks. This means that failed tests can run a lot longer compared to passing tests because failures need to wait for the timeout to happen. This can become frustrating when a core feature breaks and many of our tests need to wait for timeouts.
Another cause might be simply having too many tests. Yes, there is such a thing as too many tests. For example, poorly designed test suites can cover certain parts with multiple methods of testing the same cases. Adding insult to injury, we might see this while other functionality remains untested.
Furthermore, we might be spending time on setting up or tear down activities. This can become worse with a higher number of failures. For example, in case we need to recreate our Spring context because the failed tests dirtied the last one or the context failed to load, then our tests might end up spending more time with preparation than the actual execution. Similar problems can happen if we need to start fresh Docker containers as test dependencies in order to isolate our test cases from each other.
Speeding Up Our Tests
We can start searching for solutions based on the factors above. Let us look at our options below.
Using More Threads
As the number of tests starts to grow and their execution gets longer, eventually, someone will come up with the brilliant idea of running them in parallel. After all, this is what we tried with the production code as well, and it worked well. Why would it be any different for the tests? It should be fine as long as you adhere to the F.I.R.S.T. principles. Sadly, these principles can be best applied in the case of unit tests. As soon as we try to use them on higher levels of the test pyramid, the situation will become a bit more complicated because we will start depending on shared resources (shared dependencies); therefore, our tests will stop being fully isolated.
I would argue that "I" (Isolated) is at least very challenging to achieve at the same time as "F" (Fast) and "T" (Thorough) when we are using shared resources. We will either need a lot of instances isolated from each other, or we would need to use additional code to isolate our test cases from each other. For example, creating different users or separate "tenants" in our database can help us to avoid side effects or dependency between our tests. This approach is very situational, though not all applications and certainly not all features will isolate data between users or tenants. Also, this can become a trade-off between "I" and "T" if we end up limiting the thoroughness of our tests to reduce the burden of isolation. Alternatively, it can become a trade-off between "I" and "F" if we decide that thoroughness is vital, making our tests run even longer. Factoring in the additional complexity and how hard it is to spin up the test environment, even in case we have the tools to do so, we will likely see long test runs.
Going back to the case of more threads, we must remember that using more threads will not reduce test execution time linearly. For example, if the tests are executed in under 10 minutes (600 seconds) on a single thread, the following graph shows the times representing the optimal outcome we could hope for.
The benefits are getting less and less, minimal even, once we reach 4-5 threads. In addition to this, modern CPU-s tend to turbo higher in the case of single-threaded workloads, so there can be a difference between threads. Furthermore, many of us are not monitoring how efficiently the available threads are used; we would be surprised how many of them are idle if we did.
Running Fewer Tests
We could evaluate our tests, carefully weighing up their cost and value. Then simply remove the ones which are no longer needed. This is a good idea regardless, as it can reduce the cost of maintenance as well, but we should be mindful of the benefits of having tests and having enough of them. Ultimately, this is one of the areas where QAs shine. Proper test design and evaluating the boundaries, severity, impact, and priority of each potential bug or feature we need to cover can help a lot.
Stopping After the First Few Failures
The idea of fail-fast is not new in software development. Engineers have been using it for decades. Since it can help shorten the feedback loop in the case of our code, it could make sense for us to consider it here for the same reason. We are not the first to come up with this idea. In fact, both Maven and Gradle have a feature that can stop after the first failed tests. Before we would rush to turn them on, let us give it a thought and figure out when this feature could be most beneficial and what might be the trade-off we are making by doing so.
When Is It Okay to Use Fail-Fast?
We could approach this question by looking at the effects of using these features. In general, when we stop after the first failure, we are rephrasing the question our tests can answer. Originally it is "Which features are not passing our tests?" while in fail-fast mode, it turns into "Do we have any features which are not passing our tests?". There is a huge difference. On one hand, as long as we need a boolean answer or only a green/red dot on a build monitor, then by all means, we can use the fail-fast options because we are not using the additional information either way. On the other hand, we should not use this feature if we need additional data because we want to see the full picture.
A Compromise
Getting back to the initial problem, this would be a great use-case for skipping those tests which are suspected of failing because we know that their feature is not working well for some reason. This way, we could form some groups from our tests and use the fail-fast approach for the individual groups instead of the whole suite. This can tell us more about what is working and still avoid running many of the failing tests. Although the Maven mentioned above or Gradle features cannot help with this, the good news is that this is still feasible, even if it requires some work. Let us dive deeper into the next sections!
A Selective Fail-Fast
I think we should list what we need to make this selective fail-fast execution work.
Forming Groups
It is tempting to use some kind of a labeling system, adding one or more labels to each test (class or method), putting them into different categories based on which feature they belong to, or, more importantly, what kind of shared resources they depend on. Most likely, these labels will produce different sets with many intersections between them. This might sound complicated, but since nothing is ever simple or pure, we should embrace these overlapping parts instead of burying our heads in the sand. We can, for example, say that whenever a test is in multiple sets, we can make a note of the outcome in every matching set and evaluate the situation based on the results.
We need to face yet another challenge regarding forming groups because we cannot expect anyone to willingly annotate hundreds of tests one by one with dependency metadata. We are lazier than that! We should use some kind of pattern matching, or we could at least rely on already used annotation just in case someone was diligently tagging each feature.
Not All failures Are Equal.
Having our groups is half of the battle. We need to figure out what to do with them! Referring back to the causes we discussed in the earlier sections, we could handle different kinds of failures with different approaches. For example, if the issue happens during test setups, such as Spring context start or container launch, we can expect a larger number of failures because every test of the class will most likely fail. Due to this, these failures could count as more serious compared to the regular, isolated case of a single failing test case.
As a result, we could say that after reaching a threshold, if all tests failed for a group even before we could start execution of the test code, then we should skip the rest of that group.
Skipping Tests
Once we decided to ignore a test because past executions resulted in enough failures for one of its matching groups, we needed a way to skip execution instead of failing it. This way, the test report will display the truth and tell us how many tests failed and how many were never executed because we have changed our minds. Fortunately, each of the major test frameworks relied on some special exception when implementing their assumption handling features. Normally, we can use assumptions to decide whether the test case should run or not based on some information we can evaluate during the execution of the test suite. This is more or less what we want to do as well, so we can simply throw the right kind of exception before the test case would start running. This will show up as skipped in every report.
A New Report
Since our use case is not exactly what traditional test reports are designed for, it can be vital to have a granular test report that is aware of what we are doing. Making notes of each decision, each execution result, execution times, threads used, exceptions thrown, etc. We can even use it to evaluate how effective our multi-threaded test execution is and investigate test isolation issues.
How Can We Try It?
A Java library named Abort-Mission is already available, implementing all aspects of this selective fail-fast approach. Please continue reading to see how it works.
Terminology
Abort-Mission is using terms resembling space launch-related terminology:
- The test setup (context initialization, test instance instantiation, etc.) is named the countdown stage.
- The actual execution of the test method is named the mission stage.
- The countdown and the mission together are named a launch.
Each stage can end in one of four ways:
- Success
- Failure: the test execution failed; an exception was thrown (e.g., an assertion error).
- Suppressed: the execution failed, but the method/class was marked with an annotation to make Abort-Mission ignore this failure.
- Aborted: Abort-Mission decided to skip the execution because it was assumed to fail.
How Can You Integrate It?
The library uses multiple modules. One mission-control implementing the common decision-making logic and a so-called booster for integration with each supported test framework (JUnit Jupiter, TestNG, Cucumber). Each booster is different in order to fit in with the framework it supports, but overall, each integration should use the following components:
- Add the booster as a dependency.
- If you want to enable abort decisions for your test, make sure the relevant Abort-Mission listener/rule/extension/hook is executed (can be done globally in many cases).
- Create a class in the root package of the test classpath that can define when test execution should be aborted. You can group your tests either by using annotations or by using pattern-matching.
- If you want to enable reporting, then:
- Configure the report directory; AND
- Add the Gradle/Maven plugin, or simply run the provided executable jar after your tests stopped running.
Please see the following sections for framework-specific examples using Gradle!
JUnit Jupiter
Add the booster as a dependency:
testImplementation("com.github.nagyesta.abort-mission.boosters:abort.booster-junit-jupiter:4.2.0")
Annotate your tests with @LaunchAbortArmed
:
// HINT: Add Abort-Mission annotation
@LaunchAbortArmed
@TestInstance(TestInstance.Lifecycle.PER_CLASS)
public class WeatherRepositoryTest {
// test methods
}
Use the additional core annotations or the @Tag
annotation provided by Jupiter to define your dependencies and group your tests properly:
@Test
@MissionDependencies("api-key") //Abort-Mission provided annotation
public void testGetCurrentWeatherShouldReturnCurrentHumidityWhenApiKeyIsPresent() throws Exception {
//given
final WeatherRepository underTest = new WeatherRepository(API_KEY, new ObjectMapper().reader());
//when
final Weather actual = underTest.getCurrentWeather("London");
//then
Assertions.assertTrue(actual.getHumidity() > 0, "Humidity should be greater than 0");
}
@Test
@Tag("end-to-end") //The Jupiter Tag
public void testGetCurrentWeatherShouldReturnCurrentTemperatureWhenApiKeyIsPresent() throws Exception {
//given
final WeatherRepository underTest = new WeatherRepository(API_KEY, new ObjectMapper().reader());
//when
final Weather actual = underTest.getCurrentWeather("London");
//then
Assertions.assertTrue(actual.getTemperature() > -20.0, "Temperature should be greater than -20.0°C");
Assertions.assertTrue(actual.getTemperature() < 50.0, "Temperature should be less than 50.0°C");
}
Implement MissionOutline
named as MissionOutlineDefinition
preferably in your root package:
public class MissionOutlineDefinition extends MissionOutline {
@Override
protected Map<String, Consumer<AbortMissionCommandOps>> defineOutline() {
// HINT: use the default, "shared" namespace by not adding a name
return Map.of("", ops -> {
final TagDependencyNameExtractor tagNames = new TagDependencyNameExtractor();
// Note: More kinds of matchers available.
// See the methods of the builder returned by the matcher() method.
final MissionHealthCheckMatcher endToEnd = matcher()
.dependencyWith("end-to-end").extractor(tagNames).build();
final MissionHealthCheckMatcher apiKeyMissionDependency = matcher()
.dependency("api-key").build();
ops.registerHealthCheck(percentageBasedEvaluator(
matcher().or(endToEnd)
.orAtLast(apiKeyMissionDependency).build())
.abortThreshold(25) // abort if failure percentage is higher than 25%
.burnInTestCount(1) // execute 1 test before evaluating the threshold
.build());
ops.registerHealthCheck(reportOnlyEvaluator(matcher().anyClass().build()).build());
});
}
}
You can find the full example in the repository here.
TestNG
Add the booster as a dependency:
testImplementation("com.github.nagyesta.abort-mission.boosters:abort.booster-testng:4.2.0")
Annotate your tests with @LaunchAbortArmed
and add the AbortMissionListener
to your tests as a listener:
// HINT: Add Abort-Mission annotation
@LaunchAbortArmed
// HINT: Add listener
@Listeners(AbortMissionListener.class)
public class WeatherRepositoryTest {
//test methods
}
Use the additional core annotations or the groups
attribute of the @Test
annotation provided by TestNG to define your dependencies and group your tests properly:
@Test
@MissionDependencies("api-key") //Abort-Mission provided annotation
public void testGetCurrentWeatherShouldReturnCurrentHumidityWhenApiKeyIsPresent() throws Exception {
//given
final WeatherRepository underTest = new WeatherRepository(API_KEY, new ObjectMapper().reader());
//when
final Weather actual = underTest.getCurrentWeather("London");
//then
Assert.assertTrue(actual.getHumidity() > 0, "Humidity should be greater than 0");
}
@Test(groups = "end-to-end") //The groups we might be using already
public void testGetCurrentWeatherShouldReturnCurrentTemperatureWhenApiKeyIsPresent() throws Exception {
//given
final WeatherRepository underTest = new WeatherRepository(API_KEY, new ObjectMapper().reader());
//when
final Weather actual = underTest.getCurrentWeather("London");
//then
Assert.assertTrue(actual.getTemperature() > -20.0, "Temperature should be greater than -20.0°C");
Assert.assertTrue(actual.getTemperature() < 50.0, "Temperature should be less than 50.0°C");
}
Implement MissionOutline
named as MissionOutlineDefinition
preferably in your root package:
public class MissionOutlineDefinition extends MissionOutline {
@Override
protected Map<String, Consumer<AbortMissionCommandOps>> defineOutline() {
// HINT: use the default, "shared" namespace by not adding a name
return Map.of("", ops -> {
final GroupDependencyNameExtractor groupNames = new GroupDependencyNameExtractor();
// Note: More kinds of matchers available.
// See the methods of the builder returned by the matcher() method.
final MissionHealthCheckMatcher endToEnd = matcher()
.dependencyWith("end-to-end").extractor(groupNames).build();
final MissionHealthCheckMatcher apiKeyMissionDependency = matcher()
.dependency("api-key").build();
ops.registerHealthCheck(percentageBasedEvaluator(
matcher().or(endToEnd)
.orAtLast(apiKeyMissionDependency).build())
.abortThreshold(25) // abort if failure percentage is higher than 25%
.burnInTestCount(1) // execute 1 test before evaluating the threshold
.build());
ops.registerHealthCheck(reportOnlyEvaluator(matcher().anyClass().build()).build());
});
}
}
You can find the full example in the repository here.
Cucumber
Add the booster as a dependency:
testImplementation("com.github.nagyesta.abort-mission.boosters:abort.booster-cucumber-jvm:4.2.0")
Use the AbortMissionPlugin
to enable reporting:
@RunWith(Cucumber.class)
// HINT: AbortMissionHook is picked up automatically as it is in the same package
@CucumberOptions(features = "classpath:/features",
// HINT: Add plugin for reporting
plugin = "com.github.nagyesta.abortmission.booster.cucumber.AbortMissionPlugin",
objectFactory = PicoFactory.class)
public class RunTest {
}
Create a Cucumber hook extending the LaunchAbortHook
class:
public class AbortMissionHook extends LaunchAbortHook {
@Override
protected Map<String, Consumer<AbortMissionCommandOps>> defineOutline() {
// HINT: use the default, "shared" namespace by not adding a name
return Map.of("", ops -> {
final TagDependencyNameExtractor tagNames = new TagDependencyNameExtractor();
// Note: More kinds of matchers available.
// See anyScenarioMatcher() , scenarioNameMatcher() and scenarioUriMatcher()
final MissionHealthCheckMatcher endToEndMatcher = matcher()
.dependencyWith("EndToEnd").extractor(tagNames).build();
ops.registerHealthCheck(percentageBasedEvaluator(endToEndMatcher)
.abortThreshold(25) // abort if failure percentage is higher than 25%
.burnInTestCount(1) // execute 1 test before evaluating the threshold
.build());
ops.registerHealthCheck(reportOnlyEvaluator(anyScenarioMatcher()).build());
});
}
@Before
@Override
public void beforeScenario(final Scenario scenario) {
doBeforeScenario(scenario);
}
@After
@Override
public void afterScenario(final Scenario scenario) {
doAfterScenario(scenario);
}
@Override
protected void doAbort() {
throw new AssumptionViolatedException("Aborting as the launch failure threshold is reached.");
}
}
Make sure to use relaxed schema validation when you are generating the flight evaluation report:
// HINT: Configure Abort-Mission plugin
abortMission {
version = "4.2.0"
relaxedValidation = true
}
You can find the full example in the repository here.
The Results
Once you are finished, you should run your tests and see the generated reports under your build directory. For example, the Jupiter module can generate a report like this when the external dependency cannot be reached (because the API key is missing or invalid).
I know skipping two failing tests might not be too impressive. Also, we could have used the assumptions in the first place to find out when the API key is missing, but assumptions can not solve everything. For example, you need to make a real service call to figure out that authentication is working. So putting aside the fact that this is a very simple, lab-grown example, I think it can show the concept, and you can take it to the next level by augmenting your project using these simple concepts.
Conclusion
We have demonstrated that stopping or skipping tests in case of failures should not always be a yes or no decision. If the higher-level tests depend on external dependencies or similar factors, then stopping only the related (failing) tests can be a valid choice.
In fact, I have already used this in a hobby project, where it helped me to see my Cucumber reports a bit faster when I have inadvertently broken a feature or two. You can find out more here.
What do you think? Do you have a use case where this approach could make sense?
Opinions expressed by DZone contributors are their own.
Comments