Code Smells: If Statements
This comprehensive guide considers when if statements become a problem and the steps you can take to refactor your code.
Join the DZone community and get the full member experience.
Join For FreeThe article in this series that has so far provoked the most responses was on iteration. Today, I’m talking about if statements (conditionals). I’m not intending to go after any sacred cows or anything – for loops and if statements are pretty much the first things we learn in many programming languages. They’re a fundamental part of languages like Java. But just because these basic building blocks exist doesn’t mean they should be our goto strategy (see what I did there?) every time. We could program in assembler but we don’t. We could use a hammer for every job around the house, but we don’t (well…).
If statements are very useful, and often the right tool for the job, but they can also be a smell. Remember that by “smell” I mean “code that you should look carefully at to see if there’s a simpler/easier to understand, approach”. I don’t mean “if statements are evil never use them”.
In the last two articles, we’ve been looking at the smells from a single method, validateQuery
. You may have noticed we’ve been ignoring the series of if
s that make up the main body of the method. Now seems like a good time to turn our attention to them. We’re going to start with (more or less) the version of this method that has already benefited from the previous two refactoring blog posts. I’ve made some minor changes though to make the refactoring in this article a bit simpler:
static ValidatedField validateQuery(final Class clazz, final Mapper mapper,
final String propertyPath, final boolean validateNames) {
final ValidatedField validatedField = new ValidatedField(propertyPath);
if (!propertyPath.startsWith("$")) {
final String[] pathElements = propertyPath.split("\\.");
final List<String> databasePathElements = new ArrayList<>(asList(pathElements));
if (clazz == null) {
return validatedField;
}
validatedField.mappedClass = mapper.getMappedClass(clazz);
for (int i = 0; ; ) {
final String fieldName = pathElements[i];
final boolean fieldIsArrayOperator = fieldName.equals("$");
Optional<MappedField> mf = validatedField.mappedClass.getMappedField(fieldName);
//translate from java field name to stored field name
if (!mf.isPresent() && !fieldIsArrayOperator) {
mf = validatedField.mappedClass.getMappedFieldByJavaField(fieldName);
if (validateNames && !mf.isPresent()) {
throw fieldNotFoundException(propertyPath, fieldName, validatedField.mappedClass);
}
if (mf.isPresent()) {
databasePathElements.set(i, mf.get().getNameToStore());
}
}
validatedField.mappedField = mf;
i++;
if (mf.isPresent() && mf.get().isMap()) {
//skip the map key validation, and move to the next fieldName
i++;
}
if (i >= pathElements.length) {
break;
}
if (!fieldIsArrayOperator) {
//catch people trying to search/update into @Reference/@Serialized fields
if (validateNames && (mf.get().isReference() || mf.get().hasAnnotation(Serialized.class))) {
throw cannotQueryPastFieldException(propertyPath, fieldName, validatedField);
}
if (!mf.isPresent() && validatedField.mappedClass.isInterface()) {
break;
} else if (!mf.isPresent()) {
throw fieldNotFoundException(propertyPath, validatedField);
}
//get the next MappedClass for the next field validation
MappedField mappedField = mf.get();
validatedField.mappedClass = mapper.getMappedClass((mappedField.isSingleValue()) ? mappedField.getType() : mappedField.getSubClass());
}
}
validatedField.databasePath = databasePathElements.stream().collect(joining("."));
}
return validatedField;
}
If you remember, the thing that originally drew my attention to this method was the multiple isPresent
checks that are the result of wrapping the MappedField mf
in an Optional
. I thought Optional
was going to solve all my problems, and it didn’t. But of course, the problem was not the introduction of Optional
– this code originally had null checks in every place we now check isPresent()
, so I should have been asking myself, why do we need to check if something is null so many times? Mutability was causing some of the confusion, so now we’ve simplified the method to minimise mutability, let’s take a closer look at the rest of the code.
The Smell: If Statements
Looking at the code above you can see 11 different if statements, many of which check more than one condition. Two contain a break
, one a return
. There’s only one else
, which actually follows one of the if statements with a break
, making it almost entirely pointless. By the end of the method it’s possible that you’ve visited more than one of the conditional branches, and it’s hard to reason about which ones, why, and what that means.
The problem with this code is that it’s actually doing two things – it’s (possibly) validating the query (based on the validateNames
parameter value), and it’s also traversing the String property path and converting this into a domain object, the MappedField
that represents the element at the end of this path. I can see why these two things are conflated, as it means iterating over the path only once, but it a) requires a flag to decide whether to do the validation bits and b) makes it really hard to tell which parts are required for validation, which parts are required for navigating the path and which parts are required for obtaining the relevant information to get the appropriate MappedField
. On top of that, some of the conditionals are used to control the looping behavior as well.
Step 1: Place Guard Conditions at the Start
Some of these if statements are controlling the flow – returning from the method if possible. What I would like to do is move these as early in the method as possible, so my brain can spot these special conditions and then ignore them for the rest of the method.
The first of these is straightforward:
final ValidatedField validatedField = new ValidatedField(propertyPath);
if (!propertyPath.startsWith("$")) {
final String[] pathElements = propertyPath.split("\\.");
final List<String> databasePathElements = new ArrayList<>(asList(pathElements));
if (clazz == null) {
return validatedField;
}
//...
I want to move the guard condition on clazz
right to the beginning, outside of the first if
. There are a couple of reasons: 1) all the stuff that happens before the check just isn’t needed for this null check, it will work without them (and we don’t incur the performance hit of String parsing when the clazz
is null and doesn’t need to do it anyway) 2) it seems like the right thing to do to check the param values and exit as fast as possible before getting into any further logic.
You have to do a quick check yourself to make sure this doesn’t actually change the behaviour of the method, because potentially the method may have been returning two different values in the case of a null clazz
: one value when the property path starts with $ and one where it doesn’t. If we look at the whole method, we return our brand new validatedField
in both cases, so it’s fine to move this check right to the top:
final ValidatedField validatedField = new ValidatedField(propertyPath);
if (clazz == null) {
return validatedField;
}
if (!propertyPath.startsWith("$")) {
final String[] pathElements = propertyPath.split("\\.");
final List<String> databasePathElements = new ArrayList<>(asList(pathElements));
While I’m here, I’m going to do another, unrelated refactoring. What does is mean when the propertyPath
starts with a dollar? Why does that matter? We could add a comment to this if statement to clarify its meaning, but my preference is usually to wrap these conditions in a method that returns a boolean, and give it a useful name:
if (!isOperator(propertyPath)) {
final String[] pathElements = propertyPath.split("\\.");
final List<String> databasePathElements = new ArrayList<>(asList(pathElements));
//... rest of the method here
}
private static boolean isOperator(String propertyPath) {
return propertyPath.startsWith("$");
}
MongoDB paths that start with a dollar sign are usually some sort of operation, not a path to a field, so this method is simply ignoring operators and only cares about fields. Actually, if I preferred, I could better reflect this in the code by pushing the “not” into the method and calling it isFieldPath
, if I think that’s more understandable. This is totally a personal choice.
Step 2: Remove Logic for Controlling the Iteration
The abuse of the for loop syntax in this code pains me:
for (int i = 0; ; ) {
I want to turn this into a proper for loop, or possibly a while. In a different version, I created my own enumeration to manage the looping. It’s not at all obvious from the code (and there’s certainly no magic refactoring in IntelliJ IDEA) how to fix this problem. Moving directly to...
for (int i = 0; i< pathElements.length; i++) {
...is not going to work, because we manage the increment of i
inside the method. So we have to identify the code that manages the looping, and replace those pieces with something that will work in a traditional loop.
i++;
if (mf.isPresent() && mf.get().isMap()) {
//skip the map key validation, and move to the next fieldName
i++;
}
if (i >= pathElements.length) {
break;
}
This section right in the middle of the code is the bit that manipulates the index and decides when the loop is finished. Once we’ve identified that, we can work out which bits to remove or change.
The i++
can be replaced with the i++
in the traditional for
loop. But if we remove this line and rely on the loop increment, i
is going to be incremented at the end of the loop, not halfway through, so we need to take that into account and make appropriate changes.
If we want to remove the increment, anywhere that was using the value of i
after line 76 needs to use (i+1)
, because that’s what the previous value effectively was. This is only in one place (line 82, see above), so it’s fairly easy.
if ((i + 1) >= pathElements.length) {
break;
}
With these changes, the tests should all still pass (and they do, fortunately).
Everything after the break statement happens only if there are more elements to process after this current one. That’s not really obvious from the code as it is, but if we flip the if statement...
...we get:
if ((i + 1) < pathElements.length) {
if (!fieldIsArrayOperator) {
//catch people trying to search/update into @Reference/@Serialized fields
if (validateNames &&
(mf.get().isReference() || mf.get().hasAnnotation(Serialized.class))) {
throw cannotQueryPastFieldException(propertyPath, fieldName, validatedField);
}
if (!mf.isPresent() && validatedField.mappedClass.isInterface()) {
break;
} else if (!mf.isPresent()) {
throw fieldNotFoundException(propertyPath, validatedField);
}
//get the next MappedClass for the next field validation
MappedField mappedField = mf.get();
validatedField.mappedClass = mapper.getMappedClass((mappedField.isSingleValue()) ? mappedField.getType() : mappedField.getSubClass());
}
} else {
break;
}
Now we’re using a proper for loop with a guard condition, we don’t need the break anymore (especially as it’s right at the end of the method). The for loop itself will make sure don’t go past the end of the array. So we can remove this else statement completely.
Step 3: Extract Method for Readability
As before, now I’m reasonably happy with the state of one of these conditional I’m going to extract the condition into a small method with a useful name so I can better understand what I’m checking.
if (hasMoreElements(pathElements, i)) {
if (!fieldIsArrayOperator) {
// rest of the method....
}
private static boolean hasMoreElements(String[] pathElements, int index) {
return (index + 1) < pathElements.length;
}
Now our code is a bit easier to understand. We’re using a traditional for loop to iterate over the array, so we removed the logic around incrementing the index and guarding against going past the end of the array. The only minor complication around this is the intentional additional increment to skip particular values, but this is a bit easier to see and understand now we’ve removed all the other index management.
Step 4: Replace Multiple Checks of Value
Several if statements check the same condition, so what I would like to do is see if I can evaluate this condition separately, just once, to simplify the reasoning.
We see fieldIsArrayOperator
is checked twice, and both times we care about if it is not an array operator. My question, therefore, is what do we do, if anything, when it is an array operator? Here we actually need to either go into depth digging through the code, or apply some domain knowledge.
Line 65:
if (!mf.isPresent() && !fieldIsArrayOperator) {
// ...do something
}
Don’t run this code if it’s an array operator.
Line 82
if (!fieldIsArrayOperator) {
//...do something
}
Don’t run this if it’s an array operator.
The only code that gets executed if the field name is an array operator is:
Optional<MappedField> mf = validatedField.mappedClass.getMappedField(fieldName);
validatedField.mappedField = mf;
if (mf.isPresent() && mf.get().isMap()) {
//skip the map key validation, and move to the next fieldName
i++;
}
With research, testing and/or domain knowledge we can figure out that mf
is always going to be empty when the fieldName
is “$”, and that if it’s an array operator it’s not going to be a Map
. Therefore I can surmise that zero code gets executed if fieldName
is an array operator.
That means I can shortcut all the code under the circumstances that fieldIsArrayOperator
is true:
for (int i = 0; i < pathElements.length; i++) {
final String fieldName = pathElements[i];
final boolean fieldIsArrayOperator = fieldName.equals("$");
if (fieldIsArrayOperator) {
continue;
}
//... etc ...
}
Basically, if the fieldName
is a dollar sign, skip this and move on to the next. Here I have to run all the tests again to check my hypothesis is correct (they pass, which is good). Now I can do some cleanup. IntelliJ IDEA confirms that this has achieved what I wanted – now I’m checking for the value right at the beginning, I can remove the check from later statements (lines 68 and 85).
Like before, I want this if statement to call a usefully-named method, so I’m going to replace the variable name with a method call instead.
Step 5: Extract getMappedField Method (Again)
The sharp-eyed amongst you will notice that this code does not include the refactoring from the mutable values post, where I extracted a getMappedField
method to constrain the mutation of mf
to a single method. I did this because I wanted to be able to simplify the conditions inside that method too. Now we’ve done that (we removed fieldIsArrayOperator
), we can re-extract this method. I’ve decided to give it a different name to last time to reflect what’s really going on, it’s now getAndValidateMappedField
.
I still don’t like how many parameters this method has, but we’ll look into that later.
@NotNull
private static Optional<MappedField> getAndValidateMappedField(String fieldName, String propertyPath, boolean validateNames,
ValidatedField validatedField, List<String> databasePathElements,
int i) {
Optional<MappedField> mf = validatedField.mappedClass.getMappedField(fieldName);
//translate from java field name to stored field name
if (!mf.isPresent()) {
mf = validatedField.mappedClass.getMappedFieldByJavaField(fieldName);
if (validateNames && !mf.isPresent()) {
throw fieldNotFoundException(propertyPath, fieldName, validatedField.mappedClass);
}
if (mf.isPresent()) {
databasePathElements.set(i, mf.get().getNameToStore());
}
}
return mf;
}
Step 6: Collapse Two Ifs Into an If Else
The main content of my getAndValidateMappedField
method is:
if (!mf.isPresent()) {
mf = validatedField.mappedClass.getMappedFieldByJavaField(fieldName);
if (validateNames && !mf.isPresent()) {
throw fieldNotFoundException(propertyPath, fieldName, validatedField.mappedClass);
}
if (mf.isPresent()) {
databasePathElements.set(i, mf.get().getNameToStore());
}
}
IntelliJ IDEA wants me to collapse the last one into a “functional” expression, but I hope we can do better than that. There’s three if statements here and all of them check isPresent
. This seems slightly ridiculous. Granted, this is the method that contains the mutable mf
value, so that’s a contributing factor – we’ll need two isPresent
checks because we try getting the value twice. If we look at the inner if
s though, surely it doesn’t make sense to have one if statement that does something if it’s not present and one that does something if it is – that sounds like an if/else, right?
Now here I might expect IntelliJ IDEA to offer me some magic fix, but I’m assuming the presence of the validateNames
check is messing with the IDE’s ability to reason about the code (or: the order of these checks might actually matter under some circumstances). Here, we can use our ability to reason to figure out that actually these two if
s can be swapped:
if (mf.isPresent()) {
databasePathElements.set(i, mf.get().getNameToStore());
}
if (validateNames && !mf.isPresent()) {
throw fieldNotFoundException(propertyPath, fieldName, validatedField.mappedClass);
}
This should easily translate to an if/else:
And we end up with:
if (mf.isPresent()) {
databasePathElements.set(i, mf.get().getNameToStore());
} else if (validateNames) {
throw fieldNotFoundException(propertyPath, fieldName, validatedField.mappedClass);
}
This does exactly the same thing, it’s just a bit easier to reason about. Why? Because 1) we see at a glance at most one of these branches will be executed now it’s an if/else 2) we’ve cut out one of the three isPresent
checks so we don’t have to keep reasoning about what it means if it’s not present 3) by working with the positive value (isPresent
, rather than !isPresent
) first it’s actually easier for humans to process. It’s also just nice to deal with the exceptional case at the end.
Step 7: Reducing Errors
Going back to the original method, let’s look at the last part of the code again:
IntelliJ IDEA tells me, rather alarmingly, that despite all my isPresent
checks, my mf.get()
here could actually be on an empty field. Ooops. The fact that code passes all the tests at this point either means mf
really is never empty at this point, or that we don’t have a test that exercises this code path. I would highly recommend trying to write some tests to get into this code section if this were my production code.
I’m tempted to delete all the code that I can, but lets do the safest thing. Firstly, we can move the check for interface right to the top
if (hasMoreElements(pathElements, i)) {
if (!mf.isPresent() && validatedField.mappedClass.isInterface()) {
break;
} else if (!mf.isPresent()) {
throw fieldNotFoundException(propertyPath, validatedField);
}
//catch people trying to search/update into @Reference/@Serialized fields
if (validateNames &&
(mf.get().isReference() || mf.get().hasAnnotation(Serialized.class))) {
throw cannotQueryPastFieldException(propertyPath, fieldName, validatedField);
}
//get the next MappedClass for the next field validation
MappedField mappedField = mf.get();
validatedField.mappedClass = mapper.getMappedClass((mappedField.isSingleValue()) ? mappedField.getType() : mappedField.getSubClass());
}
Technically this changes the logic compared to the original version, but it is much safer and if the tests still pass (which they do), it’s probably the right thing to do to check the edge conditions first.
Step 8: Simplify the Logic
I want to simplify all of this. I still hate the multiple checks for isPresent
, so is there something we can do here?
Given both the if
and the else
at the start of this code section return from the method, the rest of the code is effectively inside another else:
if (hasMoreElements(pathElements, i)) {
if (!mf.isPresent() && validatedField.mappedClass.isInterface()) {
break;
} else if (!mf.isPresent()) {
throw fieldNotFoundException(propertyPath, validatedField);
} else {
//catch people trying to search/update into @Reference/@Serialized fields
if (validateNames &&
(mf.get().isReference() || mf.get().hasAnnotation(Serialized.class))) {
throw cannotQueryPastFieldException(propertyPath, fieldName, validatedField);
}
//get the next MappedClass for the next field validation
MappedField mappedField = mf.get();
validatedField.mappedClass = mapper.getMappedClass((mappedField.isSingleValue()) ? mappedField.getType() : mappedField.getSubClass());
}
}
We can turn these statements around a little to make it a bit more readable again:
if (hasMoreElements(pathElements, i)) {
if (mf.isPresent()) {
//catch people trying to search/update into @Reference/@Serialized fields
if (validateNames &&
(mf.get().isReference() || mf.get().hasAnnotation(Serialized.class))) {
throw cannotQueryPastFieldException(propertyPath, fieldName, validatedField);
}
//get the next MappedClass for the next field validation
MappedField mappedField = mf.get();
validatedField.mappedClass = mapper.getMappedClass((mappedField.isSingleValue()) ? mappedField.getType() : mappedField.getSubClass());
} else if (validatedField.mappedClass.isInterface()) {
break;
} else {
throw fieldNotFoundException(propertyPath, validatedField);
}
}
This is marginally better because a) we only check isPresent
once, and we look at the case in which it IS there first, and deal with the different types of exceptions second and b) again using if/else
instead of a series of if
s, we should be able to reason about which branches are executed when – we know only one of the three options is ever executed.
Now we’ve moved all this around a bit we can simplify the code in the isPresent
case:
if (mf.isPresent()) {
MappedField mappedField = mf.get();
//catch people trying to search/update into @Reference/@Serialized fields
if (validateNames && (mappedField.isReference() || mappedField.hasAnnotation(Serialized.class))) {
throw cannotQueryPastFieldException(propertyPath, fieldName, validatedField);
}
//get the next MappedClass for the next field validation
validatedField.mappedClass = mapper.getMappedClass((mappedField.isSingleValue()) ? mappedField.getType() : mappedField.getSubClass());
} else if (validatedField.mappedClass.isInterface()) {
We don’t have so many unpleasant get()
calls, and we can see our get()
next to the isPresent()
check, which is comforting. I would have liked to use an ifPresent()
here, but we can’t combine that with the else
code paths.
Final Step: Extract (more) Methods for Readability
Now we’ve been able to simplify the if statements (removing, reordering, flipping etc), we should be reasonably happy with what each of the conditions is checking. At this point, I like to extract methods for all of the conditional logic to document what they do.
Firstly, I’m extracting a method to document what the map condition is checking. I could leave it as it is:
private static boolean isMap(Optional<MappedField> mf) {
return mf.isPresent() && mf.get().isMap();
}
…but it feels like there must be a more functional way of saying the same thing. I ended up with:
private static boolean fieldIsMap(Optional<MappedField> mf) {
return mf.map(MappedField::isMap)
.orElse(false);
}
I’ll be honest, I’m not sold on that being more readable. Of course, it does remove one of the many isPresent()
checks, which I guess is a Good Thing, and it is using the Optional
in a more functional style, which some people prefer. Which approach you take may be down to team preferences.
Next, I want to clarify what we’re doing here:
if (validateNames && (mappedField.isReference() || mappedField.hasAnnotation(Serialized.class))) {
I extracted just the second part into a method:
private static boolean cannotQueryPastCurrentField(MappedField mappedField) {
return mappedField.isReference() || mappedField.hasAnnotation(Serialized.class);
}
You could put validateNames
into the method as well, I chose to keep it out as I would like to keep it visible in the validateQuery
method.
The final method looks like:
static ValidatedField validateQuery(final Class clazz, final Mapper mapper,
final String propertyPath, final boolean validateNames) {
final ValidatedField validatedField = new ValidatedField(propertyPath);
if (clazz == null) {
return validatedField;
}
if (!isOperator(propertyPath)) {
final String[] pathElements = propertyPath.split("\\.");
final List<String> databasePathElements = new ArrayList<>(asList(pathElements));
validatedField.mappedClass = mapper.getMappedClass(clazz);
for (int i = 0; i < pathElements.length; i++) {
final String fieldName = pathElements[i];
if (isArrayOperator(fieldName)) {
continue;
}
validatedField.mappedField = getAndValidateMappedField(fieldName, validatedField, propertyPath,
validateNames, databasePathElements, i);
if (isMap(validatedField.mappedField)) {
//skip the map key validation, and move to the next fieldName
i++;
}
if (hasMoreElements(pathElements, i)) {
if (validatedField.mappedField.isPresent()) {
MappedField mappedField = validatedField.mappedField.get();
//catch people trying to search/update into @Reference/@Serialized fields
if (validateNames && cannotQueryPastCurrentField(mappedField)) {
throw cannotQueryPastFieldException(propertyPath, fieldName, validatedField);
}
//get the next MappedClass for the next field validation
validatedField.mappedClass = getMappedClass(mapper, mappedField);
} else if (validatedField.mappedClass.isInterface()) {
break;
} else {
throw fieldNotFoundException(propertyPath, validatedField);
}
}
}
validatedField.databasePath = databasePathElements.stream().collect(joining("."));
}
return validatedField;
}
Summary
We are now down to 47 lines of code (from 62). Some of that is being able to remove code, some is extracting methods, and some is simply because we’ve been able to remove whitespace and comments now the code is a tiny bit more self-explanatory. It is still a lot more procedural than I like, and new developers to the code will probably still wonder exactly what’s going on, but the method now fits onto a single laptop screen (just!) and is a bit easier for a human to parse.
Note that once again, I have not removed the use of the thing I accused of being a “smell” – we still have if
statements (a lot of them actually! But we’ve reduced it from 11 to 8). The differences are:
- We don’t have to clutter our navigation/validation checks with conditionals for managing the looping behavior, we let a standard
for
loop take care of that - We use more else statements, so it’s easier to reason around which code branches are executed and when
- We’ve moved early-exit code as close to the top of the method as possible, so that we can see those cases and ignore them when we’re looking further down the code
- We’ve extracted tiny methods to encapsulate our conditions (once we’d simplified them) so that we understand better what each if statement is really checking, and hopefully why.
Symptoms:
- Lots of if statements
- Very few else branches
- The same / similar conditions appearing in multiple branches (
isPresent
,fieldIsArrayOperator
) - Flow control being managed manually
Possible solutions:
- Flipping the logic. Humans find it easier to reason about the positive (rather than negative) values, so try to use positive terms where possible. If a negation is necessary, you might wrap this in a method that’s easier to understand.
- If a value is being checked regularly (like
fieldIsArrayOperator
was), it may be easier to reason about it if you pull out just this case and deal with that individually. Then you can remove the check from all the other places. - Consider whether an
else
is going to be more readable than a series ofif
s. Remember that a series ofif
s (withoutelse
) can potentially lead to all branches being executed, whereas an explicitelse
shows that this is an either/or. This is easier for a human to reason about, but it’s also potentially more efficient for the computer too. You may need to consider flipping the logic in order to make use ofelse
statements. - Once you’ve done as much as you can for reducing duplicate use of values in the conditions, consider refactoring each condition, even possibly single checks, into its own method with a meaningful name. This can enormously improve the readability of if statements (see: Extract Method, Decompose Conditional)
- Cases which are guard cases, basic validation or simple checks for something that might be an exit case could be moved to the top of any method (if this doesn’t impact the method’s overall logic) so that the reader of the code doesn’t have to reason about these cases throughout the method (see: Replace Nested Conditional with Guard Clauses).
- On the other hand, fall-through exceptional cases may belong at the end of the method or in a final
else
, so we understand this is a “when all else fails” situation.
Published at DZone with permission of Trisha Gee, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments