Dust: Open-Source Actors for Java

Dust integrates a robust Actor system with Java virtual threads. This paradigm removes common problems associated with massively multi-threaded applications.

Alan Littleford

Oct. 21, 24 · Presentation

Likes (5)

Comment

Save

8.1K Views

Virtual Threads

Java 21 saw the supported introduction of virtual threads. Unlike regular Java threads (which usually correspond to OS threads), virtual threads are incredibly lightweight, indeed an application can create and use 100,000 or more virtual threads simultaneously.

This magic is achieved by two major changes to the JVM:

A virtual thread is managed by the JVM, not the OS. If it is executing, it is bound to a platform thread (known as a carrier); if it is not executing (say it is blocked waiting for some form of notification), the JVM "parks" the virtual thread and frees the carrier thread so it can schedule a different virtual thread.
A platform thread typically has about 1 megabyte of memory preassigned to it for its stack, etc. In contrast, a virtual thread’s stack is managed in the heap and can be as little as a few hundred bytes — growing and shrinking as needed.

The API for managing cooperation and communication between virtual threads is exactly the same as for legacy platform threads. This has good and bad points:

The good: Implementers are familiar with the interface.
The bad: You are still faced with all the usual "hard" parts of multi-threaded applications — synchronized blocks, race conditions, etc. — only now the problem is increased by orders of magnitude.

Moreover, a virtual thread cannot get parked in a synchronized block – so the more synchronized blocks are used, the less efficient virtual threads become.

What is needed is a new approach. One that can exploit the ability to run millions of virtual threads in a meaningful way but do so while making multi-threaded programming easier. In fact, such a model exists and it was first discussed 50 years ago: Actors.

Actors and Dust

The Actor concept arose during the 1970s at MIT with research by Carl Hewitt. The Actor concept is at the core of languages like Erlang and Elixir and frameworks like Dust: an open-source (Apache2 license) implementation of Actors for Java 21+.

Different implementations of Actors vary in the details, so from now on we will describe the specific Dust Actor model:

An Actor is a Java object associated with exactly one virtual thread.
An Actor has a "mailbox" that receives and queues messages from other Actors. The thread wait()s on this mailbox, retrieves a message, processes it, and returns to waiting for its next message. How the Actor processes messages is called its Behavior.
- Note that if the Actor has no pending messages then, since the mailbox thread is virtual the JVM will "park" the Actor and reuse its thread. When a message is received, the JVM will un-park the Actor and give it a thread to process the message. This is all transparent to the developer whose only cares are messages and behaviors.

An Actor may have its own mutable state which is inaccessible outside the Actor. In response to receipt of a message an Actor may:

Mutate its state
Send immutable messages to other Actors
Create or destroy other Actors
Change its Behavior

That’s it. Note that an Actor is single threaded so there are no locking/synchronization issues within an Actor. The only way an Actor can influence another Actor is by sending it an immutable message – so there are no synchronization issues between Actors.

The order of messages sent by one Actor to another is preserved by the receiving Actor but continuity is not guaranteed. If two Actors send messages to the same Actor at the same time, the messages may be interleaved but the order of each stream is preserved.

Actors are managed by an ActorSystem. It has a name, and, optionally a port number. If the port is specified, then Actors in the ActorSystem can receive messages sent remotely — either from another port or another host entirely. The ActorSystem takes care of (de)serialization of messages in the remote case.

An Actor has a unique address which resembles a URL: dust://host:port/actor-system-name/a1/a2/a3.

If you are communicating with Actors in the same Actor System, the URL can be reduced to: /a1/a2/a3.

This is more than a pathname, though: it expresses a parent/child relationship between Actors, namely:

An Actor was created with the name a1. It then created an Actor called a2 : a1 is the "parent" and a2 the "child" of a1. Actor a2 then created a child of its own called a3.

Actors can create many children. The only requirement is their names be distinct from their "siblings."

Actor Structure

Actors extend the Actor class. It is important to note that Actors are not created directly with a "new" but use a different mechanism. This is needed to set up correct parent-child relationships. We use the Props class for this as in the following simple example:

/**
* A very simple Actor
*/
public class PingPongActor extends Actor { 

    private int max; 

    /**
    * Used internally to call the appropriate constructor
    */
    public static Props props(int max) {         
        Props.create(PingPongActor.class, max);  
    } 
    
    public PingPongActor(int max) { this.max = max } 

    // Define the initial Behavior     
    @Override     
    protected ActorBehavior createBehavior() {         
        return message → {             
            switch(message) {                 
                case PingPongMsg → {
                     sender.tell(message, self);                     
                     if (0 == --max)                         
                        stopSelf();                 
                }                 
                default → System.out.println(“Strange message …”);             
            }         
        }     
    } 
}

Actors are created from their Props (see below), which can also include initialization parameters. So in the above, our PingPongActor initialization includes a max count, whose use we will show shortly.

Actors are created by other Actors, but that chain has to begin somewhere. When an ActorSystem is created, it creates several default top-level Actors, including one called /user.

An application can then create children of this Actor via the ActorSystem:

ActorSystem system = new ActorSystem('PingPong');    
ActorRef ping = system.context.actorOf(PingPongActor.props(1000000), ‘ping’);

The context of an ActorSystem provides the actorOf() method, which creates children of the /user Actor. Actors themselves have an identical actorOf() for creating their children.

If we now looked into the ActorSystem, we would see a new PingPongActor whose name is ping and whose path is /user/ping. The value returned by this creation step is an ActorRef — a "handle" to that particular Actor. Let's build another:

    ActorRef pong = system.context.actorOf(PingPongActor.props(1000000), ‘pong’);

So now we have two instances of PingPongActor, with their "max" state set to 1000000 and both are waiting to receive messages in their mailbox. When it has a message, it passes it to the createBehavior() lambda, which implements our behavior. So what does this behavior do?

First, we need a nice message class to get things fired up:

    public class PingPongMsg implements Serializable {}

The only constraint on messages is they must be serializable.

So now let’s look at our setup:

    ActorSystem system = new ActorSystem('PingPong');     
    ActorRef ping = system.context.actorOf(PingPongActor.props(1000000), ‘ping’);     
    ActorRef pong = system.context.actorOf(PingPongActor.props(1000000), ‘pong’); 

    pong.tell(new PingPongMsg(), ping);

ActorRefs have a tell() method which takes a Serializable message object and a (nullable) ActorRef. Thus, in the above an instance of PingPongMsg is delivered to the Actor at pong. Since the second argument was not null, that ActorRef (ping) is available as the "sender" variable in the recipient's behavior.

Recall that the part of the behavior that dealt with a PingPongMsg was:

        case PingPongMsg → {             
            sender.tell(message, self);             
            if (0 == --max)                 
                stopSelf();         
        }

The sender of this message gave me his ActorRef (ping) so I am simply sending the message back to him, telling him that I (pong) am the sender via the self variable. Rinse, lather, and repeat one million times. So the same message will have been passed back and forth two million times in total between the two Actors, and once their counters hit 0, each Actor will destroy itself.

Beyond PingPong

PingPongActor was just about the simplest example capable of giving a feel for Actors and Dust, but is clearly of limited value otherwise. GitHub contains several Dust repos which constitute a small library around the Dust framework.

dust-core– The heart of Dust: Actors, persistent Actors, various structural Actors for building pipelines, scalable servers, etc.
- Programmer documentation
dust-http – Small library to make it easy for Actors to access Internet endpoints, etc.
dust-html – A small library to make manipulating web page content easy in idiomatic Dust
dust-feeds – Actors to access RSS feeds, crawl ,websites, and use SearXNG for web searches
dust-nlp – Actors to access ChatGPT (and similar) endpoints and the Hugging Face embeddings API

The Actor paradigm is an ideal match for event-driven scenarios. Dust has been used to create systems such as:

Intelligent news reader using LLMs to identify and follow trending topics
Building occupancy management using WiFi signal strengths as proxies for people
A digital twin of a toy town – 8000 Actors just to simulate flocking birds!
A system to find and analyze data for M&A activities

Java virtual machine Open source Java (programming language) Data Types

Opinions expressed by DZone contributors are their own.

Related

Trending