High-Performance Persistence With MicroStream (Part Two)

Read on!

Sven Ruppert

CORE ·

Oct. 29, 19 · Analysis

Likes (1)

Comment

Save

4.6K Views

Keep pushing for the best performance possible!

For some time, there has been a new competitor in the field of persistence and serialization. We are talking about Project MicroStream. What is it exactly? MicroStream claims to be a high-performance and, most importantly, developer-friendly solution for the challenges of serialization and persistence. How easy, fast and comfortable that is, we will look at in detail in a multi-part series.

You may also like: High Performing Persistance With MicroStream (Part One)

What Happened Until Now

In the last part, we looked at how to prepare a project for the MicroStream Storage-Engine so that we can start the experiments. For the first practical experiments, a class with exactly one attribute was created, and this was instantiated several times, written to disk (persisted) and subsequently brought back to life (loaded and instantiated). Also tried was the Kotlin variant of this HelloWorld class.

What's up Today

In our last attempts, we did not care how to handle many elements. Here in Java, we have the most diverse data structures that are already part of the JDK itself. We will take a closer look today. But before we deal with a variety of elements, we will first solve a little something that we have just ignored the last time.

The speech here is that in our last part of the series, the storage engine started with the default values. As a result, all tests used the same space to store their elements. Of course, that's not practical. If you start now a MicroStream - Storage-Engine you can also configure their behavior.

For the current examples, a separate directory created on the hard disk of the computer under the directory target/storage for each test. Since we manage the project with maven, the folder will be deleted with each call of the command mvn clean. This practice ensures that more or less fixed intervals of the space used for the tests are released on the storage medium.

For the first tests, we will look at a practical solution that is not meant to be used on a large scale. Since we use jUnit5 in conjunction with the standard attached TestEngine Jupiter, we can inject a parameter of the type TestInfo into each test method. This holder contains information about the test itself.

Among other things, the name of the test class and the test method are provided. So we have everything together to create a separate directory for each test, which we can assign later as a human again each test. At this point, I deliberately ignore error handling to keep the examples as small as possible.

public static String TARGET_PATH = "./target/storage/";

public Function<TestInfo, File> infoToCleanFolder = (info) -> {
  final Class<?> aClass     = info.getTestClass().get();
  final Method   method     = info.getTestMethod().get();
  final File     tempFolder = new File(TARGET_PATH, 
                                       aClass.getSimpleName() + "_" + method.getName());
  if (tempFolder.exists()) {
    try {
      walk(tempFolder.toPath()).sorted(reverseOrder())
                               .map(Path::toFile)
                               .forEach(File::delete);
    } catch (IOException e) {
      logger().warning(e.getMessage());
    }
  }
  return tempFolder;
};

An individual directory for the respective test is generated in the directory target/storage. If a folder already exists from a previous test run, this on-demand call will delete and recreate it. This approach will lead to the configuration to launch the MicroStream Storage Engine with the following lines of source code.

final File                   tempFolder      = infoToCleanFolder.apply(info);
final EmbeddedStorageManager storageManagerA = EmbeddedStorage.start(tempFolder);

After all, tests from the previous part have been rebuilt; accordingly, all tests should be run successfully in any order.

Let's go to our first attempt to save a list of items. The implementation for this example is in the class ListOfElementsTest. RootNode is the class named CollectionRootNode. It only has one attribute again, this time of type Collections<T>.

public class CollectionRootNode<T> {
  private final Collection<T> elements;

  CollectionRootNode(Collection<T> elements) {this.elements = elements;}

  public Collection<T> getElements() {
    return elements;
  }
}

Again, the first test consists of writing and reading a list of elements. Using the class IntStream generates a set of 100_000 items that are subsequently passed to the instance RootNode and stored. A first measurement showed me a time of 215ms on my now quite some years old Mac Book. I expressly ask you not to pay particular attention to this value, which is not a result obtained with specific care.

public class HashMapRootNode {
  private final Map<Integer, HelloWorldImmutable> elements;
  public HashMapRootNode(Map<Integer, HelloWorldImmutable> elements) {
    this.elements = elements;
  }
  public Map<Integer, HelloWorldImmutable> getElements() {
    return elements;
  }
}

@Test
void test001(TestInfo info, TestReporter reporter) {
  final File                   tempFolder      = infoToCleanFolder().apply(info);
  final EmbeddedStorageManager storageManagerA = EmbeddedStorage.start(tempFolder);

  final Map<Integer, HelloWorldImmutable> elements 
    = IntStream.range(0, 100_000)
               .mapToObj(i -> Pair.next(i, new HelloWorldImmutable(i + "")))
               .collect(toMap(Pair::getT1, Pair::getT2));

  final Instant start = now();
  storageManagerA.setRoot(new HashMapRootNode(elements));
  storageManagerA.storeRoot();
  final Instant stop = now();
  reporter.publishEntry("duration store [ms] " + between(start, stop).toMillis());

  storageManagerA.shutdown();

  final EmbeddedStorageManager storageManagerB = EmbeddedStorage.start(tempFolder);
  final HashMapRootNode        rootAgain       = (HashMapRootNode) storageManagerB.root();

  Assertions.assertEquals(elements, rootAgain.getElements());

  storageManagerB.shutdown();
}

Again, the data is stored in 250ms on the disk.

From Lists to Trees

So far, we have stored only lists or simple key/value pairs. But how does it look with its own and the meshing more complex structures? For this, we first create an example that is also theoretical in nature.

For this experiment, I create a class named BinaryTree, and the only value stored per element in this binary tree is of type Integer. The nodes themselves have been realized by a class called Node.

public class Node {
  public Node(int id) { this.id = id; }
  private final int  id;
  private       Node left;
  private       Node right;
  private       Node parent;
  //Getter - Setter - skipped
}

public class BinaryTree
    implements HasLogger {

  private Node rootNode;

  private Node add(Node current, int value) {
    if (current == null) { return new Node(value); }
    if (value < current.getId()) current.setLeft(add(current.getLeft(), value));
    else if (value > current.getId()) current.setRight(add(current.getRight(), value));
    else return current; // value already exists
    return current;
  }

  public void add(int value) { rootNode = add(rootNode, value); }

  public List<Integer> traverseLevelOrder() {
    if (rootNode == null) return Collections.emptyList();
    List<Integer>     nodeIDs = new ArrayList<>();
    final Queue<Node> nodes   = new LinkedList<>();
    nodes.add(rootNode);
    while (!nodes.isEmpty()) {
      Node node = nodes.remove();
      nodeIDs.add(node.getId());
      if (node.getLeft() != null) { nodes.add(node.getLeft()); }
      if (node.getRight() != null) { nodes.add(node.getRight()); }
    }
    return nodeIDs;
  }

I did not implement some features for this test. The missing features include, for example, the ability to remove nodes from the tree or to check the tree for the presence of a particular node. Here at this point, I want to build a tree using self-defined classes, then fill in with data and then save. The subsequent reading of this data structure should generate the same image again in memory.

The test used for this is implemented using JUnit5. An arbitrary set of values is generated utilizing a random number generator and then transferred to the tree. As in the previous examples shown, the root-node, including the elements we want to persist, is stored by the MicroStream storage-engine. After shutting down the used instance of the storage-engine, a new instance will be created to read the root-node back into memory. Both instances of type BinaryTree must be compared if they are equal.

To simplify the comparison process of these two instances, the class BinaryTree is offering a method with the name traverseLevelOrder(), which returns a list of integer values. So you can compare then both lists of type Integer with the JUnit5 - assertions.

@Test
void test001(TestInfo info, TestReporter reporter) {
  final File                   tempFolder      = infoToCleanFolder().apply(info);
  final EmbeddedStorageManager storageManagerA = EmbeddedStorage.start(tempFolder);

  final BinaryTree tree = new BinaryTree();
  new Random().ints()
              .limit(1_000)
              .forEach(tree::add);

  final Instant start = now();
  storageManagerA.setRoot(tree);
  storageManagerA.storeRoot();
  final Instant stop = now();
  reporter.publishEntry("duration store [ms] " + between(start, stop).toMillis());
  storageManagerA.shutdown();

  final EmbeddedStorageManager storageManagerB = EmbeddedStorage.start(tempFolder);
  final BinaryTree             rootAgain       = (BinaryTree) storageManagerB.root();
  Assertions.assertEquals(tree.traverseLevelOrder(), rootAgain.traverseLevelOrder());
  storageManagerB.shutdown();
}

Conclusion

We have seen in this section that storing and loading binary trees is very easy to use. If you have to deal with the challenge of mapping trees in XML or JSON, you will be able to appreciate how attractive this solution is. It is also complicated if you have to store and manage a data structure like this tree in an RDBMS.

In the next part, we will look at if and how we can work with graphs. I must confess at this point that I do not know the result myself, now. I am curious about what will happen. Another topic will be one of the next parts on how to deal with large amounts of data managed by the MicroStream engine.

Happy Coding!

High-Performance Persistence With MicroStream (Part Two)

Read on!

What Happened Until Now

What's up Today

From Lists to Trees

Conclusion

Further Reading

Partner Resources

Related

Trending

High-Performance Persistence With MicroStream (Part Two)

Read on!

What Happened Until Now

What's up Today

From Lists to Trees

Conclusion

Further Reading

Related

Partner Resources