Two Cool Java Frameworks You Probably Don’t Need
Mutation testing and property-based testing are two relatively niche technologies in the Java tester's toolkit. Read more in this article.
Join the DZone community and get the full member experience.
Join For FreeWe’ve all attended – maybe even delivered – talks where the speaker is particularly enamored by a language or tool and uses the word simply a little too often in phrases like “simply add this configuration key or dependency." Some healthy suspicion to counteract this enthusiasm is always recommended, especially when the technology is new, little used, specialist, or a combination of all three.
Software frameworks never work for free, even when you don’t pay a license fee. Unfamiliar tech has a learning curve, however gentle. Don’t tell me I can have all the benefits of tool X by "just" adding a single line to build.gradle
(or twenty to pom.xml
). Particularly testing tools should always be approached as a potential liability. They must justify their total cost of ownership with a commensurate increase in quality, and this higher quality must make business sense. Sorry to dampen your passion, but you’re not hired as an artist. Such an equation is impossible to express in hard numbers. Common sense will have to suffice.
Property-Based Testing and Mutation Testing
In this post, I want to discuss two established but relatively niche techniques in the Java tester’s toolkit: mutation testing with Pitest and property-based testing with jqwik. I wrote earlier about PBT and MT with a developer’s hat on and a "tech for tech’s sake" mindset. Now I will don the CFO’s budget-approval hat and explain why you should think hard before using them. But first, a very concise refresher if you’re unfamiliar with the technologies.
A mutation testing (MT) framework makes small but significant alterations (mutants) to the compiled code under test. Don’t worry, this is done in memory and sources are not touched. The JVM can still run the new bytecode, but the altered behavior should now cause at least one unit test to fail, provided you have adequate coverage. We call this killing the mutants, in good gamers' parlance. MT is particularly revealing when a test suite has great coverage but poor assertions. Many tests will remain green, but MT staying green is bad news.
Property-based testing (PBT) is altogether different. It lets you define test scenarios for properties, which are true statements that hold for a range of values. “Nobody under the age of 18 is admitted,” is such a statement. With PBT some method boolean isAgeAllowed(int age)
can then be invoked with a range of randomized values between zero and 18. See AdmissionCalculatorPropertySuite in the companion GitLab project for examples.
git clone git@gitlab.com:jasper-sprengers/pbt-and-mt.git
MT is a quality validation technique that finds missing or incomplete assertions in existing tests. PBT enhances existing tests by hitting them from multiple angles. In the testing-pyramid spectrum of unit/component/integration/end-to-end tests, they are squarely positioned at the base. Despite their differences, they have one important similarity in common: they are tools that can make a well-structured codebase better, but they are useless, if not actually harmful, in projects with low test maturity. In addition, they are non-trivial to deploy and use effectively.
Let’s elaborate with a canonical example of a good test-driven approach. You are writing a component that calculates an admission fee based on a patron’s date of birth. Luckily your team is a stickler for good specifications:
- A person’s age in years must evaluate to a non-negative value. Signal exceptions with “date of birth not valid."
- Children under the age of four or adults over ninety are not admitted to this scary theme park ride. Signal exceptions with “patrons must be between 4 and 90 years old."
- Admission is 10 euros for people aged 15 or younger, 15 euros for ages 16 and up.
The code is a simple set of if statements over an integer value (forgive my verbosity); the full code is here.
if (age < 0) {
throw new IllegalArgumentException("date of birth [%s] is not valid".formatted(dateOfBirth));
} else if (age < 4 || age > 90) {
throw new IllegalArgumentException("patrons must be between 4 and 90 years old, but is [%s]".formatted(age));
} else if (age < 16) {
return 10;
} else {
return 15;
}
It’s easy to achieve 100% rock-solid coverage with trivial code like this (see AdmissionCalculatorSuite). Most changes to the business rules will automatically result in a failed test, but not all. Let’s introduce a new rule.
Adults aged 65 or older pay 10 euros.
So children and senior citizens qualify for discounts. In terms of code: if (age < 16)
becomes if (age < 16 || age >= 65)
.
All your unit tests still pass. Class, method, line, and even branch coverage is still a sterling 100%. The tests tell the truth, but it’s no longer the whole truth because a new, untested edge-case was introduced around the value 65. If you work test-driven, you should have written the extra test scenario before adding the new condition.
When Mutation Testing Won’t Help
Could MT have caught the omission? Yes, it could have changed age >= 65
to age > 65
and alert you to the fact that there wasn’t a test to cover this edge case. But you could and should have noticed it when implementing the change. You can rightly argue that production code is never as trivial as this example. Surely MT is more helpful in boosting test quality when you inherit a large codebase? Here are my reasons why it probably isn’t.
- You don’t need MT to tell you when test suites have poor assertions. A full-text search on ‘assert’ in src/test/java tells you everything you need to know.
- You don’t need MT to check test coverage. There are more efficient tools to do that. If large parts of the code are uncovered, to begin with, MT cannot yield anything useful because there is no test code to kill the mutant.
- A tool like Pitest produces a very precise but also verbose report on the mutants that slipped beneath the radar. This will be huge if you have good coverage but poor assertions. It’s like a 747 cockpit with all warning lights blinking at once. Knowing which problem to address first requires judgment. It never makes sense to kill all mutants, for the same reason that 100% test coverage usually isn’t worth it.
- An MT framework runs the same test scenario multiple times for each mutant, so the code under test should execute fast. Trips to the database, file system, or network make a mutation test run unacceptably slow. Likewise, non-cohesive code with long methods and high cyclomatic complexity create dozens of opportunities to introduce mutations. The same long method will be invoked an inordinate number of times.
Catching the Unknown Edge Cases With PBT
PBT does a great job at catching the untested addition to our business logic. Unit tests give you the truth, but property tests give you the whole truth. Since it validates all values between 4 and 90, it will now fail when it hits the range 65 to 90.
@Property
public void any_age_between_four_and_ninety_is_valid(@ForAll @IntRange(min = 4, max = 90) int age) {
assertThat(getAdmissionForAge(age)).isPositive();
}
Superficially, the above looks like a parameterized test:
@ParameterizedTest
@ValueSource(ints = {4, 90})
public void any_age_between_four_and_ninety_is_valid(int age){
... }
Don't be fooled, though. The unit test above does not test "any age," only the edge cases we happen to know. It's tempting to use PBT as a shotgun approach to killing the edge cases you forgot about, but that runs counter to its spirit. You should start from documented properties and translate these into runnable test cases. Specifying these properties should precede both test and production code.
No Framework Can Rescue a Poor Standard of Testing
When teams with poor testing maturity write unit tests, it is usually to confirm what the production code does. Given values X and Y, the method under test returns Z, so that’s what we assert. Pummeling it with a thousand different values (as PBT does) seems pointless. If your automated tests only cement the status quo like that, it is pointless indeed. The best thing you can expect from such an approach is some protection against regression. Neither PBT nor MT will help you. They can’t reveal logical lapses of judgment in the implementation, let alone errors in interpreting the design. There probably wasn’t one, to begin with.
MT and PBT can be of value in well-tested business-critical code, full of if-statements, switches, and (numeric) edge cases, where you need all the robustness that money can buy. In contrast, if a method behaves predictably for any floating-point value, testing it a thousand times with random input doesn’t give you much insight.
Don’t use these frameworks for support functions, i.e., code that supports the algorithmic core of the application: the web or messaging endpoints, the database access layer, the security layer, or data transfer mapping logic. Don’t write PBT scenarios that hit such code, and make sure Pitest ignores those parts for mutation.
These resource-intensive frameworks are only worth it if you have properly isolated the salient algorithms into small classes that can be tested thousands of times without bringing the house down. You may even find that you don’t need mutation testing anymore as you refactor the logic into testable chunks and improve the coverage and quality of the unit tests.
PBT and MT are fascinating technologies, so do check them out. But they’re also the stuff of master’s theses. They have a slightly academic whiff to them, divorced from the world of business where quality bears a cost and must be negotiable. If you do decide to use them, take the time to know them well, and don’t get caught in a mindset of testing for testing’s sake.
Opinions expressed by DZone contributors are their own.
Comments