The Right Way to Reverse a String in Java
Learn the right way to reverse a String.
Join the DZone community and get the full member experience.
Join For FreeFacts and Terminology
As you probably know, Java uses UTF-16 to represent String
. The char
data type and the Character
class are based on the original Unicode specification, which defined characters as fixed-width 16-bit entities. The Unicode Standard has since been changed to allow for characters whose representation requires more than 16 bits. Therefore, in the UTF-16 representation, there are characters (Code Points) that are represented by one- and some other characters that are represented by two char values (Code Units).
Please check out the Java String length confusion article and the JavaDoc of the Character class for more details and a more-detailed explanation.
Example
Character: A
"UTF-16 representation" in Java: "\u0041"
Character: Mathematical double-struck capital A (Unfortunately, the DZone editor has the same issue as what is described below, it prints ?? instead of the real character after save: ��)
"UTF-16 representation" in Java: "\uD835\uDD38"
The first one is straightforward; the second one is a little bit more interesting; this single character (Code Point) is represented by two Unicode escapes. This means a couple of things:
This single character is represented by two
char
(orCharacter
) values (Code Units)The
length()
of thisString
is two (see: Java String length confusion)The
toCharArray()
method returns a char array (char[]
), which has two elements (0xD835
and0xDD38
respectively)Both
charAt(0)
andcharAt(1)
return something (noStringIndexOutOfBoundsException
), but these values are not valid charactersIf you do any character manipulation, you need to consider this case and handle these characters, which consist of two
char
(surrogates)Therefore, most of the character manipulation code we ever wrote is probably broken.
This basically means that you probably do not want to do any character manipulation (see below).
Broken String Reverse
By this point, you might have a good guess what is wrong with this (very commonly used) solution to reverse a String:
static String reverse(String original) {
String reversed = "";
for (int i = original.length() - 1; 0 <= i; i--) {
reversed += original.charAt(i);
}
return reversed;
}
Let's see it in action:
String str = "\uD835\uDD38BC"; // Three characters: A, B, C (4 chars)
System.out.println(str); // prints ABC (A is the double-struck A)
System.out.println(reverse(str)); // prints CB??
If you run the reverse method above, it will produce a String
like this: "CB\uDD38\uD835"
. C
and B
are ok but \uDD38\uD835
is invalid, that's why you see ??
when you print it. The method should not have reversed them; the valid result would be "CB\uD835\uDD38"
(CBA
(double-struck A)).
Solution
Usually, not writing code to solve problems is a good idea:
static String reverse(String original) {
return new StringBuilder(original).reverse().toString();
}
If you want to take a (small) step further, here's a Java 8+ one-liner:
Function<String, String> reverse = s -> new StringBuilder(s).reverse().toString();
If you are curious how this is implemented under the hood, check out what StringBuilder's reverse() method does (it is in theAbstractStringBuilder
class).
So what broken String
manipulations have you seen? Let us know in the comments below!
Published at DZone with permission of Jonatan Ivanov. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments