Improve Your Application's Serialization Through Efficient Object Marshaling
Demonstrating how to encode small Strings into long primitives using object marshaling examples and how to improve the performance of your app’s serialization.
Join the DZone community and get the full member experience.
Join For FreeEfficient code doesn't just run faster; if it's using less compute-resource, it may also be cheaper to run. In particular, distributed cloud applications can benefit from fast, lightweight serialization.
OpenSource Java Serializer
Chronicle-Wire is an OpenSource Java serializer that can read and write to different message formats such as JSON, YAML, and raw binary data. This serializer can find a middle ground between compacting data formatting (storing more data in the same space) versus compressing data (reducing the amount of storage required). Instead, data is stored in as few bytes as possible without causing performance degradation. This is done through marshaling an object.
What Is Object Marshaling? Why Use It?
Marshaling is another name for serialization. In other words, it’s the process of transforming an object’s memory representation into another format. With wire, we can write the marshaling code agnostic of the written format, so the same marshaling code can be used to generate/read YAML, JSON, or binary representations. Because we can generate human-readable representations, we can trivially implement a toString()
method by just writing to a readable wire instance (and equals and hashcode, assuming the serialized form is equivalent to the object's identity). Moreover, for readable wire instances, we can write numeric values in string representations (e.g., timestamp long converter) or use long conversion to store short text values in numeric forms for compact writes to binary representations. This allows you to choose the format most appropriate for the application. For instance, when reading hand-crafted config files, we can use YAML. While sending over a wire to another machine or storing it in a machine-readable file, we can use binary. Or, we can also convert between them, such as for debugging binary messages going over a wire, we can read from a binary format and log using a YAML format. This can all be executed with the same code.
LongConverter Example
This example walks through a simple Plain Old Java Object (POJO) example.
public class LongConversionExampleA {
public static class House {
long owner;
public void owner(CharSequence owner) {
this.owner = Base64LongConverter.INSTANCE.parse(owner);
}
@Override
public String toString() {
return "House{" +
"owner=" + owner +
'}';
}
}
public static void main(String[] args) {
House house = new House();
house.owner("Bill");
System.out.println(house);
}
}
We start the process by storing a String object as a long. A Base64LongConverter
is used here to parse the provided CharSequence and return the results as a long. The example code can be seen in LongConversionExampleA.
public class LongConversionExampleA {
public static class House {
long owner;
public void owner(CharSequence owner) {
this.owner = Base64LongConverter.INSTANCE.parse(owner);
}
@Override
public String toString() {
return "House{" +
"owner=" + owner +
'}';
}
}
public static void main(String[] args) {
House house = new House();
house.owner("Bill");
System.out.println(house);
}
}
This then prints out the house owner’s name as a number, as it has been stored as a long:
House{owner=670118}
Printing YAML Example
We can then extend this class to use one of Chronicle’s base classes SelfDescribingMarshallable, which allows us to trivially implement a toString()
method, and the object can be reconstructed. This is useful for building sample data in unit tests from a file. It also means you can take the dump of an object in a log file and reconstruct the original object. Demonstrated in the code below is .addAlias
; this enables referring to House rather than to net.openhft.chronicle.LongConversionExampleB$House
.
LongConversionExampleB illustrates how to print out the output as YAML:
public class LongConversionExampleB {
static {
ClassAliasPool.CLASS_ALIASES.addAlias(LongConversionExampleB.House.class);
}
public static class House extends SelfDescribingMarshallable {
@LongConversion(Base64LongConverter.class)
long owner;
public void owner(CharSequence owner) {
this.owner = Base64LongConverter.INSTANCE.parse(owner);
}
}
public static void main(String[] args) {
House house = new House();
house.owner("Bill");
System.out.println(house);
}
}
When running this, instead of printing a number, the following is printed:
!House {
Owner: Bill
}
Printing JSON Example
If we want the output to be JSON, we can remove the following line from LongConversionExampleB
:
System.out.println(house);
And replace it with Wire, as this is a more lightweight alternative:
Wire wire = WireType.JSON.apply(Bytes.allocateElasticOnHeap());
wire.getValueOut().object(house);
System.out.println(wire);
This outputs the following:
{"owner": "Bill"}
Why is this helpful? Why can we not just store this originally as a String rather than a long?
Storing this as a long is a more efficient way of storing this data. While there are usually 8-bytes to a long, by using @LongConversion(Base64LongConverter.class)
, we can store 10 of the Base64 encoded characters into an 8-byte long.
How is this possible?
Typically when we talk about a byte, a byte can represent one of 256 different characters.
Yet, rather than being able to represent one of 256 characters, because we used Base64LongConverter we are saying that the 8-bit byte can only represent one of 64 characters:
.ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+
By limiting the number of characters that can be represented in a byte, can compress more characters into a long.
What if these 64 characters do not include the characters you need? Or what if there are still too many?
Chronicle Wire has different versions of this LongConverter
, from a Base64LongConverter
to a Base32LongConverter. Furthermore, it is also possible to customize your own base encoding. After all, fewer characters result in a more compact way of storing data, which, in turn, means the data is faster to both read and write, and who wouldn’t want that?
Field Group Example
While the example above works well for storing a small number of characters, how about something longer, such as a house address?
This is where we can make use of @FieldGroup
from Chronicle Bytes:
import net.openhft.chronicle.bytes.Bytes;
import net.openhft.chronicle.bytes.FieldGroup;
In LongConversionExampleC below, we walk through how to store several longs into a FieldGroup
. In this example, @FieldGroup
can store up to 5 longs, so up to 40 characters.
A benefit of storing this in primitive longs can be seen in the article "How To Get C++ Speed in Java Serialization."
public static class House extends SelfDescribingMarshallable {
@FieldGroup("address")
// 5 longs, each at 8 bytes = 40 bytes, so we can store a String with up to 39 ISO-8859 characters (as the first byte contains the length)
private long text4a, text4b, text4c, text4d, text4e;
private transient Bytes address = Bytes.forFieldGroup(this, "address");
public void address(CharSequence owner) {
address.append(owner);
}
}
The example continues below to illustrate how to create a byte[]
to store bytes, write the house object to it, and then read them.
public static void main(String[] args) {
House house = new House();
house.address("82 St John Street, Clerkenwell, London");
// creates a buffer to store bytes
final Bytes<?> t = allocateElasticOnHeap();
// the encoding format
final Wire wire = BINARY.apply(t);
// writes the house object to the bytes
wire.getValueOut().object(house);
// dumps out the contents of the bytes
System.out.println(t.toHexString());
System.out.println(t);
// reads the house object from the bytes
final House object = wire.getValueIn().object(House.class);
// prints the value of text4
System.out.println(object.address);
}
As we are using toHexString( )
, this example prints out our data, as seen in figure 1. This is a standard way of producing a hex dump. The section in green represents the ‘offset.' The number of bytes from the beginning of the string to the current position. The section in red highlights the ‘hex value’ of the stored data. In order to read this, we can take the hex number 48 (in the top row) and firstly convert this to a decimal - HEX 48, as a decimal is 72. We then take this decimal 72 and use an ASCII character chart, which tells us that this is the character ‘H.’ If we see look in the blue section, which is the ‘ASCI IOS-8859’, we see that this corresponds to the 3rd character in - ‘H’.
Figure 1. toHexString( ) Output
@Base64
As seen in the examples above, we have used:
@LongConversion(Base64LongConverter.class)
It should be noted that this can be simplified to just:
@Base64
An example of this being implemented can be seen in the code block below:
package net.openhft.chronicle.wire;
import net.openhft.chronicle.bytes.Bytes;
import net.openhft.chronicle.wire.converter.Base64;
public class Example {
public static class Base64LongConverterValue extends SelfDescribingMarshallable {
@LongConversion(Base64LongConverter.class)
long value;
public Base64LongConverter value(String msg) {
value = Base64LongConverter.INSTANCE.parse(msg);
return this;
}
}
public static class Base64Value extends SelfDescribingMarshallable {
@Base64
long value;
public Base64Value value(String msg) {
value = Base64.INSTANCE.parse(msg);
return this;
}
}
public static void main(String[] args) {
new Example().start();
}
private static void start() {
Bytes b = Bytes.allocateEleasticOnHeap();
Wire w = WireType.JSON.apply(b);
w.getValueOut().object(new Base64Value().value("hello"));
System.out.println(w.toString());
}
}
Creating Your Own Annotations
It is easy to create your own Base64
annotation that contains your own selection of 64 characters. Below is how we create the @Base64
, which makes use of the SymbolsLongConverter.
package net.openhft.chronicle.wire.converter;
import net.openhft.chronicle.wire.*;
import java.lang.annotation.*;
@Retention(RetentionPolicy.RUNTIME)
@Target({ElementType.FIELD, ElementType.PARAMETER})
@LongConversion(Base64.class)
public @interface Base64 {
LongConverter INSTANCE = new SymbolsLongConverter(
".ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_");
}
Adding a Timestamp
The example below demonstrates how to create a timestamp every time you create an event.
public class NanoTimeTest {
@Test
public void yaml() {
Wire wire = Wire.newYamlWireOnHeap();
UseNanoTime writer = wire.methodWriter(UseNanoTime.class);
long ts = NanoTime.INSTANCE.parse("2022-06-17T12:35:56");
writer.time(ts);
writer.event(new Event(ts));
assertEquals("" +
"time: 2022-06-17T12:35:56\n" +
"...\n" +
"event: {\n" +
" start: 2022-06-17T12:35:56\n" +
"}\n" +
"...\n", wire.toString());
}
interface UseNanoTime {
void time(@NanoTime long time);
void event(Event event);
}
static class Event extends SelfDescribingMarshallable {
@NanoTime
private long start;
Event(long start) {
this.start = start;
}
}
}
JLBH Benchmark Performance
To explore the efficiency of these examples, this TriviallyCopyableJLBH.java test was created. As can be seen on lines 23-26, we can switch between running the TriviallyCopyable
House (“House1”) or the BinaryWire
House (“House2”). Important to note is that trivially copyable objects were used to improve java serialization speeds. This shows that we can serialize and then de-serialize 100,000 messages a second. The Trivially Copyable version is even faster, especially at the higher percentiles.
Figure 2. Benchmark Performance Between TriviallyCopyable and BinaryWire
*Microseconds to both serialize and deserialize a message
Conclusion
Overall, LongConversion is beneficial because comparing primitive longs is more efficient than comparing Strings. Even if we take into account that String can initially be compared using their hashcode()
. Moreover, primitive longs are stored directly within the Object (this example used the ‘House’ object), so when accessing them, you do not have to undergo the level of indirection that you get when accessing an object - such as a String - through its reference.
Storing the data into primitives allows TriviallyCopyable
objects to be serialized by simply copying the memory of the java object as serialized bytes. The graph above shows this technique improves both serialization and deserialization latencies.
Published at DZone with permission of Jasmine Taylor. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments