Digging Into Sockets With Java Flight Recorder
Let's look at one amazing tool for gathering information about what's going on in your JVM — the Java Flight Recorder.
Join the DZone community and get the full member experience.
Join For FreeA lot of people use HTTP APIs to communicate between their services (let's have another article about whether it's a good solution or not), and in a majority of cases, it works fine. However, we can get into situations where performance really matters, and especially for libraries writers, JFR can be a very useful tool to see I/O works in their code.
You may also like: Using Java Flight Recorder With OpenJDK 11
This article is not an introduction to Flight Recorder, it's rather more practical. If you want to learn more about this technology, you can start with JEP 328: Flight Recorder.
Introduction to Our Netty Application
As an example, I decided to use a great project called Netty that simplifies the JDK NIO implementation and provides a lot of tricks we can use to optimize our application. Let's dive into what we are going to use for our experiments.
Feel free to check out my jfr-sockets project on GitHub and try out everything with me. The project actually contains two applications based on TCP. The first one in the package simple is just a dump TCP Client/Server and the client pushes every 1s a string my-message that takes exactly 10 bytes. And the second one is Netty-based Client/Server application where the server pushes every 1s the same string as before but using WebSocket. I deliberately chose those types of applications, and we'll look at the details of why I did it.
You can also notice a String constant called CONFIG. I'm using a trick where the file with the proper configuration is created this way, instead of changing a regular JFR configuration file. If you want to see configuration files that are in JDK, please, open this folder ${JAVA_HOME}/lib/jfr
. Very likely you spotted two files default.jfc and profile.jfc. If you open it, you can see XML configuration of JFR events. This configuration says when the event is supposed to be emitted. The configuration file default has smaller overhead (~ 1%), and it's useful for continuous profiling whereas the profile configuration is much aggressive and emits more events with lower thresholds (~ 2% overhead).
We needed to create our own settings because, by default, there is a 20 ms threshold for reading and writing but we want to see every operation.
Notice that I'm using a new feature that's available since Java 14 JEP 349: JFR Event Streaming, which means if you want to try it out, you need to download it from https://jdk.java.net/14/.
Run a Simple Blocking Application
The very first example is a class pbouda.jfr.sockets.simple.Start. If you try it out, you can see the output similar to one below. As we mentioned earlier, the application writes UTF String my-message (10 bytes), and if you look inside the methods, we actually got in the stacktrace
field, 10-byte string is encoded into 12-byte message.
Since TCP is based on streams, we need a well-known delimiter between two messages, or we need to encode the size of the message somehow. JDK implementation of writeUTF
and readUTF
is based on the second option. JDK uses the first two bytes as a short type, which determines the number of bytes in the message itself.
[0, 10, 109, 121, 45, 109, 101, 115, 115, 97, 103, 101]
According to jdk.SocketWrite event, we can see that we wrote 12-byte array (2 bytes for length, 10 bytes for a payload).
In terms of jdk.SocketRead, we can notice two events belonging to DataInputStream#readUnsignedShort
. As we mentioned above, first, we need to read the size of the payload (unsigned short). This operation emitted two events because it always calls SocketInputStream#read
twice to read short value (one byte per operation). The very last event already contains the payload and calls DataInputStream#readFully
.
xxxxxxxxxx
jdk.SocketWrite {
startTime = 20:03:57.651
duration = 0.185 ms
host = "localhost"
address = "127.0.0.1"
port = 52960
bytesWritten = 12 bytes
eventThread = "Thread-0" (javaThreadId = 18)
stackTrace = [
java.net.Socket$SocketOutputStream.write(byte[], int, int) line: 65
java.io.DataOutputStream.write(byte[], int, int) line: 106
java.io.DataOutputStream.writeUTF(String, DataOutput) line: 397
java.io.DataOutputStream.writeUTF(String) line: 325
pbouda.jfr.sockets.simple.Start$ClientHandler.run() line: 67
...
]
}
jdk.SocketRead {
startTime = 20:03:56.558
duration = 1.09 s
host = "localhost"
address = "127.0.0.1"
port = 5000
timeout = 0 s
bytesRead = 1 byte
endOfStream = false
eventThread = "main" (javaThreadId = 1)
stackTrace = [
java.net.Socket$SocketInputStream.read(byte[], int, int) line: 68
java.net.Socket$SocketInputStream.read() line: 977
java.io.DataInputStream.readUnsignedShort() line: 341
java.io.DataInputStream.readUTF(DataInput) line: 593
java.io.DataInputStream.readUTF() line: 568
...
]
}
jdk.SocketRead {
startTime = 20:03:57.651
duration = 0.0617 ms
host = "localhost"
address = "127.0.0.1"
port = 5000
timeout = 0 s
bytesRead = 1 byte
endOfStream = false
eventThread = "main" (javaThreadId = 1)
stackTrace = [
java.net.Socket$SocketInputStream.read(byte[], int, int) line: 68
java.net.Socket$SocketInputStream.read() line: 977
java.io.DataInputStream.readUnsignedShort() line: 342
java.io.DataInputStream.readUTF(DataInput) line: 593
java.io.DataInputStream.readUTF() line: 568
...
]
}
jdk.SocketRead {
startTime = 20:03:57.652
duration = 0.0699 ms
host = "localhost"
address = "127.0.0.1"
port = 5000
timeout = 0 s
bytesRead = 10 bytes
endOfStream = false
eventThread = "main" (javaThreadId = 1)
stackTrace = [
java.net.Socket$SocketInputStream.read(byte[], int, int) line: 68
java.io.DataInputStream.readFully(byte[], int, int) line: 199
java.io.DataInputStream.readUTF(DataInput) line: 613
java.io.DataInputStream.readUTF() line: 568
pbouda.jfr.sockets.simple.Start.main(String[]) line: 49
...
]
}
my-message
If you want to see how those events are represented in JDK, then here is a clue:
xxxxxxxxxx
jdk.jfr.events.SocketReadEvent
jdk.jfr.events.SocketWriteEvent
Don't hesitate to navigate to where those events are emitted. JFR events are usually cached and reused later on. In this case, we can see that the object is stored in ThreadLocal
variable. The fields are always re-set, the event is emitted, and then the object can be used again by the same thread. Everything is done for the sake of being GC friendly.
What Interesting Stuff Can We See?
The socket events are very rich and we can see a lot of interesting information. The majority of those fields are very descriptive, just a note about the address/hostname/port. We can very easily see what other applications we are communicating to, don't we call any suspicious endpoint?
However, my favorite field is duration. We can see big differences between SocketRead
events in the snippet above. Why the very first events took more than 1 second and what does it actually mean?
We already know that the client has a 1-second delay between messages. However, the duration field is incredibly valuable because now we can see that something is blocking our thread (of course, we can see it even from the stacktrace because InputStream
is blocking by nature). What actually the thread does is that it waits until any data arrives to Receive Socket Buffer to be able to copy it into internal temporary DirectBuffer
. We can notice that the other operations were much faster because bytes were already in the socket buffer and we just read the exact number of bytes we needed.
Why Do I See Such a Long Duration?
If you use Java Mission Control to visualize your JFR events, then you very likely noticed that Socket I/O section contains incredible long durations and Total I/O Time. This is the result of having some blocking TCP connections opened and waiting for data. In the case below, it's RMI TCP Connection, but you can come across a very similar problem if you have let's say RabbitMQ and don't consume any messages.
Where Are the Events Emitted From My Endpoint?
You can be also surprised: Why don't I see any other traffic? Let's say we have an application with HTTP Endpoint, which is used very often. The reason is that you probably run on the default JFR configuration that says: only operations longer than 20ms will emit an event. That means that you can celebrate because your application very likely doesn't suffer any serious (I know it's relative, 20ms is not a small latency) Socket IO problems.
Run Netty-Based WebSocket Application
Let's proceed with the second application. As we already mentioned, it's a WebSocket application where the server broadcasts the messages to all connected clients. For this experiment, we're going to connect only one client and see what messages the client receives.
If you know Netty project, you probably noticed that Netty contains several implementations of SocketChannel
. Let's start with the JDK-based NIO Socket Channel.
Run a class that automatically starts the client and server: pbouda.jfr.sockets.netty.Start.
First, we can see four events belonging to establishing a WebSocket connection. Notice that we already have a meaningful eventThread
field that can help us a lot when we start to investigate any problems.
xxxxxxxxxx
jdk.SocketWrite {
startTime = 21:21:27.135
duration = 0.677 ms
host = ""
address = "127.0.0.1"
port = 8080
bytesWritten = 292 bytes
eventThread = "client-nioEventLoopGroup-0" (javaThreadId = 26)
stackTrace = [ sun.nio.ch.SocketChannelImpl.write(ByteBuffer) line: 138
io.netty.channel.socket.nio.NioSocketChannel.doWrite(ChannelOutboundBuffer) line: 405
io.netty.channel.AbstractChannel$AbstractUnsafe.flush0() line: 931
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0() line: 354
io.netty.channel.AbstractChannel$AbstractUnsafe.flush() line: 898
...
]
}
jdk.SocketRead {
startTime = 21:21:27.149
duration = 0.0479 ms
host = ""
address = "127.0.0.1"
port = 41244
timeout = 0 s
bytesRead = 292 bytes
endOfStream = false
eventThread = "server-nioEventLoopGroup-1" (javaThreadId = 27)
stackTrace = [
sun.nio.ch.SocketChannelImpl.read(ByteBuffer) line: 73
io.netty.buffer.PooledByteBuf.setBytes(int, ScatteringByteChannel, int) line: 247
io.netty.buffer.AbstractByteBuf.writeBytes(ScatteringByteChannel, int) line: 1147
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(ByteBuf) line: 347
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read() line: 148
...
]
}
jdk.SocketWrite {
startTime = 21:21:27.177
duration = 0.102 ms
host = ""
address = "127.0.0.1"
port = 41244
bytesWritten = 129 bytes
eventThread = "server-nioEventLoopGroup-1" (javaThreadId = 27)
stackTrace = [
sun.nio.ch.SocketChannelImpl.write(ByteBuffer) line: 138
io.netty.channel.socket.nio.NioSocketChannel.doWrite(ChannelOutboundBuffer) line: 405
io.netty.channel.AbstractChannel$AbstractUnsafe.flush0() line: 931
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0() line: 354
io.netty.channel.AbstractChannel$AbstractUnsafe.flush() line: 898
...
]
}
jdk.SocketRead {
startTime = 21:21:27.183
duration = 0.0394 ms
host = ""
address = "127.0.0.1"
port = 8080
timeout = 0 s
bytesRead = 129 bytes
endOfStream = false
eventThread = "client-nioEventLoopGroup-0" (javaThreadId = 26)
stackTrace = [
sun.nio.ch.SocketChannelImpl.read(ByteBuffer) line: 73
io.netty.buffer.PooledByteBuf.setBytes(int, ScatteringByteChannel, int) line: 247
io.netty.buffer.AbstractByteBuf.writeBytes(ScatteringByteChannel, int) line: 1147
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(ByteBuf) line: 347
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read() line: 148
...
]
}
Let's see the events related to our payload (it's not so easy to figure out what events belong to what log record but it does not matter so much right now).
We can notice that we have very often one event for reading and one for writing both containing 12 bytes (it can be also different, we can read just a part of the message and then WebSocket support in Netty needs to wait for another read/buffer to be able to parse the whole WebSocket frame).
Someone can be curious why did we transport 12 bytes in a case of the 10-byte message? There is an overhead of 2 bytes in case of messages smaller than 126 bytes, it's related to WebSocket framing mechanism.
Someone can also be surprised that we didn't experience any durations longer than 1-sec in the case of SocketRead
events as we did in the first example. The reason is that NIO is configured in a non-blocking mode, which means that it does not wait on incoming data (just copies what is currently in socket buffer into JDK DirectBuffer
and pass it to Netty and don't care about the number of bytes), and the current thread just tries to do something else instead (it can read from a different channel if we have multiple connections for one event-loop). So, we can see that our application is very likely healthy in terms of blocking threads; it can be a problem if you want to pass a C10K problem (handling thousands of connections in parallel).
xxxxxxxxxx
Broadcaster-Server 2020-01-14 21:21:29,181 [client-nioEventLoopGroup-0] INFO p.j.s.n.c.WebSocketClientHandler - Received message: my-message (10 bytes)
jdk.SocketWrite {
startTime = 21:21:29.454
duration = 0.0644 ms
host = ""
address = "127.0.0.1"
port = 41244
bytesWritten = 12 bytes
eventThread = "server-nioEventLoopGroup-1" (javaThreadId = 27)
stackTrace = [
sun.nio.ch.SocketChannelImpl.write(ByteBuffer) line: 138
io.netty.channel.socket.nio.NioSocketChannel.doWrite(ChannelOutboundBuffer) line: 405
io.netty.channel.AbstractChannel$AbstractUnsafe.flush0() line: 931
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0() line: 354
io.netty.channel.AbstractChannel$AbstractUnsafe.flush() line: 898
...
]
}
jdk.SocketRead {
startTime = 21:21:29.454
duration = 0.0197 ms
host = ""
address = "127.0.0.1"
port = 8080
timeout = 0 s
bytesRead = 12 bytes
endOfStream = false
eventThread = "client-nioEventLoopGroup-0" (javaThreadId = 26)
stackTrace = [
sun.nio.ch.SocketChannelImpl.read(ByteBuffer) line: 73
io.netty.buffer.PooledByteBuf.setBytes(int, ScatteringByteChannel, int) line: 247
io.netty.buffer.AbstractByteBuf.writeBytes(ScatteringByteChannel, int) line: 1147
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(ByteBuf) line: 347
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read() line: 148
...
]
}
Show Me Some Possible Optimizations With Netty!
Netty allows us to drive our communication belonging to one connection very effectively. I prepared one example showing that we can very easily decrease the number of syscalls using a cache of messages in Netty. Simply, we decided to sacrifice a latency of one message and make our server application much more efficient by removing syscalls. Let's say we want to flush not every message but every fifth message to the given client.
We need to do two tiny changes:
xxxxxxxxxx
pbouda.jfr.sockets.netty.server.SlowConsumerDisconnectHandler
- we need to comment out flushing of every message and use simple write instead
- write method does not automatically write data into the socket, it waits for a flush context.writeAndFlush(obj) -> context.write(obj)
xxxxxxxxxx
pbouda.jfr.sockets.netty.Start#main
- uncomment the section at the end of the method `Flush a bulk of 5 messages`
Now, we can start the Netty app again. We can see that we received five messages in a bulk and transmit 60 bytes instead of 12 using one syscall/invocation.
xxxxxxxxxx
Broadcaster-Server 2020-01-14 22:12:00,937 [client-nioEventLoopGroup-0] INFO p.j.s.n.c.WebSocketClientHandler - Received message: my-message (10 bytes)
Broadcaster-Server 2020-01-14 22:12:00,937 [client-nioEventLoopGroup-0] INFO p.j.s.n.c.WebSocketClientHandler - Received message: my-message (10 bytes)
Broadcaster-Server 2020-01-14 22:12:00,938 [client-nioEventLoopGroup-0] INFO p.j.s.n.c.WebSocketClientHandler - Received message: my-message (10 bytes)
Broadcaster-Server 2020-01-14 22:12:00,938 [client-nioEventLoopGroup-0] INFO p.j.s.n.c.WebSocketClientHandler - Received message: my-message (10 bytes)
Broadcaster-Server 2020-01-14 22:12:00,939 [client-nioEventLoopGroup-0] INFO p.j.s.n.c.WebSocketClientHandler - Received message: my-message (10 bytes)
jdk.SocketWrite {
startTime = 22:12:01.603
duration = 2.23 ms
host = ""
address = "127.0.0.1"
port = 42556
bytesWritten = 60 bytes
eventThread = "server-nioEventLoopGroup-1" (javaThreadId = 27)
stackTrace = [
sun.nio.ch.SocketChannelImpl.write(ByteBuffer[], int, int) line: 167
io.netty.channel.socket.nio.NioSocketChannel.doWrite(ChannelOutboundBuffer) line: 420
io.netty.channel.AbstractChannel$AbstractUnsafe.flush0() line: 931
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0() line: 354
io.netty.channel.AbstractChannel$AbstractUnsafe.flush() line: 898
...
]
}
jdk.SocketRead {
startTime = 22:12:01.605
duration = 0.0757 ms
host = ""
address = "127.0.0.1"
port = 8080
timeout = 0 s
bytesRead = 60 bytes
endOfStream = false
eventThread = "client-nioEventLoopGroup-0" (javaThreadId = 26)
stackTrace = [
sun.nio.ch.SocketChannelImpl.read(ByteBuffer) line: 73
io.netty.buffer.PooledByteBuf.setBytes(int, ScatteringByteChannel, int) line: 247
io.netty.buffer.AbstractByteBuf.writeBytes(ScatteringByteChannel, int) line: 1147
io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(ByteBuf) line: 347
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read() line: 148
...
]
}
I Use Netty and Don't See Any Events! Why?
There are plenty of reasons, but one of them is a bit tricky. If you are running on Linux and you want to be very efficient, you can enable e-poll transport. It's a non-blocking native transport that can be used only on Linux. The big advantage of this transport is that if we have an application handling tens of thousands of connections in parallel, then, in the case of JDK NIO, we have to iterate over all connections to try if there is any data to read — we have O(N) to read data. Simple, e-poll is able to get the information about readable connections - O(1).
The problem is that the e-poll support is fully developed by Netty team and does not emit any JFR events. If you want to try it out, then:
xxxxxxxxxx
pbouda.jfr.sockets.netty.server.Server
- uncomment EpollEventLoopGroup instead of NioEventLoopGroup
- uncomment EpollServerSocketChannel instead of NioServerSocketChannel
Summary
JFR events can be a very powerful tool to find weaknesses in our applications. It's worth studying what it offers to us and be ready to use it when we run into any problem. In general, it means:
SocketRead
events: We are waiting for incoming data. Our Receive Socket Buffer is very likely empty. In the case of non-blocking transport, we don't receive any events because of the non-blocking nature of the implementation. In the case of blocking transport, we can receive events but it can be some kind of connection that is just idle and waits for any data.
SocketWrite
events: The situation is just reversed. We can block in case of a blocking transport if the Send Socket Buffer is full. In the case of non-blocking transport, we can do partial write (write just bytes that fit to Send Socket Buffer) and the rest of bytes need to be cached by our application.
If you want to try more events (even other types), then check out this repo jfr-playground. Otherwise, thank you for reading my article and please leave comments below. If you would like to be notified about new posts, then start following me on Twitter!
Further Reading
Opinions expressed by DZone contributors are their own.
Comments