Continuous monitoring of pinned threads with Spring Boot and JFR
In my previous article about Java Virtual Threads
I focused on different pinning cases caused by the synchronized
construct. Some of the presented cases were far
from being evident just by looking at the code. That’s why I believe that actively monitoring our apps for pinning is a must.
We don’t know if the pinning happens in our app if we’re not monitoring it. We don’t know how frequently it occurs. We don’t know for how long our treads are pinned. Without monitoring, we can only speculate about how pinning affects our application’s performance. I prefer measuring than guessing.
In this article, I’d like to show how you can monitor your Spring Boot apps for pinned threads. Using JFR (Java Flight Recorder) Event Streaming will allow us to detect thread pinning cases close to real-time with a minimal performance footprint. While the presented solution will rely on Spring Boot, porting it to your framework of choice should be straightforward. Those who are impatient can find the complete example on my GitHub.
Note: If you haven’t heard about the JFR before, have a quick look at the JDK Mission Control User Guide first.
Let’s start simple, to better understand why we need to complicate our lives in the first place.
Simple pinned threads logging
Probably the easiest way of monitoring thread pinning is passing -Djdk.tracePinnedThreads=full
command line option. When
pinning occurs, the JVM simply prints out the stack trace, helping us to understand where it came from. Such stack
traces look like this:
Thread[#26,ForkJoinPool-1-worker-1,5,CarrierThreads]
java.base/java.lang.VirtualThread$VThreadContinuation.onPinned(VirtualThread.java:185)
java.base/jdk.internal.vm.Continuation.onPinned0(Continuation.java:393)
java.base/java.lang.VirtualThread.parkNanos(VirtualThread.java:631)
java.base/java.lang.VirtualThread.sleepNanos(VirtualThread.java:803)
java.base/java.lang.Thread.sleep(Thread.java:507)
com.mikemybytes.playground.JfrEventsApp$Synchronizer.sync(JfrEventsApp.java:65) <== monitors:1
com.mikemybytes.playground.JfrEventsApp.lambda$main$1(JfrEventsApp.java:42)
java.base/java.lang.VirtualThread.run(VirtualThread.java:311)
What I like about this approach is its extreme simplicity. There’s no need to write even a single line of Java code!
Yet, when I tried this approach with one of my Spring Boot apps running on Java 21, the application started to hang unexpectedly. This turned out to be a known problem, caused by the JDK issue. The fix came with Java 22 already, but those stuck with version 21 might need to look further.
Unfortunately, this simple approach has a few limitations. First, it tells us nothing about the duration of the pinning. The longer it takes, the higher its potential performance impact. Second, as pointed out by Todd Ginsberg, this approach seems to be reporting only the first occurrence for each location. In other words, every unique pinning-related stack trace will be logged only once. As a consequence, we may not be able to count the number of occurrences or correlate them with the observed performance issues at a specific point in time. Last but not least, we don’t have any control over how such events are being logged.
Towards continuous monitoring
I’m happily running some Virtual Threads-powered apps already. What I’ve been looking for was a way of detecting thread pinning in real-time. I wanted to know the exact duration of pinning events and the details about their cause. I also wanted to be able to present pinning in my Grafana dashboard, so whenever something bad happens I could simply look for a correlation. Finally, I wanted to integrate this solution with the existing Spring Boot infrastructure.
Together with Virtual Threads, Java 21 came with a new JFR Event called VirtualThreadPinned
. This event
is emitted whenever the pinning occurs. Additionally, it may contain additional metadata like the stack trace
(to be more precise - recorded frames) and the duration of the event. That’s why it felt like a perfect candidate for
my monitoring solution.
Java 14 introduced JFR (Java Flight Recorder) Event Streaming, which allows consuming the JFR events from within the application itself. This gives us a lot of flexibility when it comes to processing these events. In our case, we would log them using our logger of choice and expose their duration as an observable metric.
In this post, I’ll guide you through the most important building blocks of this solution. The full code of the
example jfr-thread-pinning-spring-boot
application can be found on GitHub.
This simple Spring Boot app exposes a /pinning
endpoint which intentionally causes thread pinning via synchronized
,
so we could see the monitoring solution in action.
Recording JFR events in the background
The first step is to start listening to the JFR events from within the Spring Boot app. Extending the SmartLifecycle
interface
allows us to start/stop the recording session together with our application. Let’s have a look at the code:
package com.mikemybytes.jfr;
import jdk.jfr.consumer.RecordingStream;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.context.SmartLifecycle;
import java.time.Duration;
import java.util.concurrent.atomic.AtomicBoolean;
public class JfrEventLifecycle implements SmartLifecycle {
private static final Logger log = LoggerFactory.getLogger(JfrEventLifecycle.class);
private final AtomicBoolean running = new AtomicBoolean(false);
private RecordingStream rs;
@Override
public void start() {
if (!isRunning()) {
running.set(true);
rs = new RecordingStream(); // (1)
rs.enable("jdk.VirtualThreadPinned").withStackTrace(); // (2)
rs.onEvent(
"jdk.VirtualThreadPinned",
ev -> log.warn("Thread pinning detected! {}", ev) // (3)
);
// prevents long-running app from memory leaks
rs.setMaxAge(Duration.ofSeconds(10)); // (4)
rs.startAsync(); // (5)
}
}
@Override
public void stop() {
if (isRunning()) {
rs.close();
running.set(false);
}
}
@Override
public boolean isRunning() {
return running.get();
}
}
The start()
method is the most interesting one. We first initialize the JFR recording session by creating a RecordingStream
(1)
.
At this point, we are not listening to any JFR events yet.
With (2)
we’re explicitly enabling jdk.VirtualThreadPinned
event as something we’re interested in.
Using .withStackTrace()
ensures that the stack trace will be recorded as well.
With (3)
we are providing a simple handler function. It simply logs the received RecordedEvent
object
containing the details about the recorded JFR event. We will complicate this step a bit more later.
Step (4)
makes sure that our app won’t be accumulating events forever, which could turn into a memory
leak. A window of 10 seconds should be just enough for our purposes. We don’t need to store the history, as we need the
recorded events only for a very short time.
Finally, with (5)
we are starting the asynchronous recording session that doesn’t block further execution. Our app is
now listening to the defined set of JFR events.
Calling POST /pinning
at this point causes an intentional pinning event that gets printed out as expected. Yet, the output looks
slightly different from what we would get with -Djdk.tracePinnedThreads=full
:
2024-04-14T16:58:30.439+02:00 WARN 14151 --- [jfr-thread-pinning-spring-boot] [ Event Stream 1] c.m.jfr.pinning.JfrEventLifecycle : Thread pinning detected! jdk.VirtualThreadPinned {
startTime = 16:58:30.051 (2024-04-14)
duration = 251 ms
eventThread = "web-vt-729a03c1-5456-4686-90b2-b2659291752f" (javaThreadId = 69, virtual)
stackTrace = [
java.lang.VirtualThread.parkOnCarrierThread(boolean, long) line: 677
java.lang.VirtualThread.parkNanos(long) line: 636
java.lang.VirtualThread.sleepNanos(long) line: 793
java.lang.Thread.sleep(long) line: 507
com.mikemybytes.jfr.pinning.ThreadPinningController.sleep(Duration) line: 36
...
]
}
As promised earlier, we finally see how long the pinning took (duration
). We can find the exact thread (eventThread
)
that caused that event too. Unfortunately, the stackTrace
is limited only to the top 5 entries by default. In many cases,
this won’t be enough. Luckily, we can customize the output according to our needs.
If you analyze the whole log entry carefully, you will notice a delay between the recorded event time (16:58:30.051
)
and our log entry (16:58:30.439
). That delay is why this solution is close to real-time monitoring, but not exactly one.
In my opinion, the delay is small enough for our purposes (it’s usually close to 500ms). We just have to be aware of it
when analyzing the collected pinning data.
Customizing pinning event data
Let’s extract the event handling part to a dedicated JfrVirtualThreadPinnedEventHandler
. This way we will decouple the
event-specific logic from the JFR recording session itself.
Since we already know that presenting only the top 5 stack trace entries won’t be enough, we should tweak the output a bit.
Unfortunately, the RecordedObject
class (which RecordedEvent
extends) hardcodes the stack depth.
We also can’t use the PrettyWriter
class
that is used under the hood, as it’s not exposed by the jdk.jfr
module… As a result, we have to provide our own stack
trace formatting method.
Additionally, we want our handler class to expose the pinning duration as Micrometer Timer
.
With Prometheus and Grafana integration in place, we could then present
it together with other application metrics. Since Timer acts as a Counter
too, we would also get the number of registered events.
The implementation of our handler could look like this:
package com.mikemybytes.jfr.pinning;
import io.micrometer.core.instrument.MeterRegistry;
import jdk.jfr.consumer.RecordedEvent;
import jdk.jfr.consumer.RecordedFrame;
import jdk.jfr.consumer.RecordedStackTrace;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.stereotype.Component;
import java.time.LocalDateTime;
import java.time.ZoneId;
import java.util.stream.Collectors;
@Component
class JfrVirtualThreadPinnedEventHandler {
// improvement idea: expose as a configuration
private static final int STACK_TRACE_MAX_DEPTH = 25;
private final Logger log = LoggerFactory.getLogger(JfrVirtualThreadPinnedEventHandler.class);
private final MeterRegistry meterRegistry;
JfrVirtualThreadPinnedEventHandler(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
}
void handle(RecordedEvent event) {
// marked as nullable in Javadoc
var thread = event.getThread() != null ? event.getThread().getJavaName() : "<unknown>";
var duration = event.getDuration();
var startTime = LocalDateTime.ofInstant(event.getStartTime(), ZoneId.systemDefault());
var stackTrace = formatStackTrace(event.getStackTrace(), STACK_TRACE_MAX_DEPTH);
log.warn(
"Thread '{}' pinned for: {}ms at {}, stacktrace: \n{}",
thread,
duration.toMillis(),
startTime,
stackTrace
);
var timer = meterRegistry.timer("jfr.thread.pinning");
timer.record(duration);
}
private String formatStackTrace(RecordedStackTrace stackTrace, int maxDepth) {
if (stackTrace == null) {
return "\t<not available>";
}
String formatted = "\t" + stackTrace.getFrames().stream()
.limit(maxDepth)
.map(JfrVirtualThreadPinnedEventHandler::formatStackTraceFrame)
.collect(Collectors.joining("\n\t"));
if (maxDepth < stackTrace.getFrames().size()) {
return formatted + "\n\t(...)"; // truncated
}
return formatted;
}
private static String formatStackTraceFrame(RecordedFrame frame) {
return frame.getMethod().getType().getName() + "#" + frame.getMethod().getName() + ": " + frame.getLineNumber();
}
}
If we cause thread pinning via the endpoint once more, we will see our custom output formatting in action:
2024-04-14T17:18:32.460+02:00 WARN 14322 --- [jfr-thread-pinning-spring-boot] [ Event Stream 1] m.j.p.JfrVirtualThreadPinnedEventHandler : Thread 'web-vt-01aad770-6783-4b53-b8a9-21e5a812fa0e' pinned for: 250ms at 2024-04-14T17:18:32.095766166, stacktrace:
java.lang.VirtualThread#parkOnCarrierThread: 677
java.lang.VirtualThread#parkNanos: 636
java.lang.VirtualThread#sleepNanos: 793
java.lang.Thread#sleep: 507
com.mikemybytes.jfr.pinning.ThreadPinningController#sleep: 36
com.mikemybytes.jfr.pinning.ThreadPinningController#lambda$pinCarrierThread$0: 27
com.mikemybytes.jfr.pinning.ThreadPinningController$$Lambda+0x000000b00155f6d8.1266805443#run: -1
java.lang.Thread#runWith: 1596
java.lang.VirtualThread#run: 309
java.lang.VirtualThread$VThreadContinuation$1#run: 190
jdk.internal.vm.Continuation#enter0: 320
jdk.internal.vm.Continuation#enter: 312
jdk.internal.vm.Continuation#enterSpecial: -1
Just as intended, we are no longer limited to only the top 5 stack trace entries. Yet, it’s important to have a reasonable limit here, as for some recursive calls the stack trace could be very, very long.
Last but not least, let’s examine the /actuator/prometheus
endpoint output looking for our jfr.thread.pinning
timer.
Keep in mind, that dots will be replaced with underscores here, so we’re looking for jfr_thread_pinning
:
(...)
# HELP jfr_thread_pinning_seconds_max
# TYPE jfr_thread_pinning_seconds_max gauge
jfr_thread_pinning_seconds_max 0.25093325
# HELP jfr_thread_pinning_seconds
# TYPE jfr_thread_pinning_seconds summary
jfr_thread_pinning_seconds_count 1.0
jfr_thread_pinning_seconds_sum 0.25093325
(...)
I’ll leave causing a thread pinning multiple times to see how the output changes as an exercise for you 😉
Next steps
The presented solution is just a starting point. You can play with the logging output as much as you want. You may introduce Micrometer Histogram to have more detailed data about the registered pinning occurrences over time. You will also need to extend your dashboards, so the recorded data is visible there.
Also, you may want to come up with a custom “signature” for each location causing pinning. It could then be used as a tag for each event reported via metrics. This way, you could easily group pinning occurrences from the same location. Such a signature could be a simple hash over the recorded stack trace, or something more fancy.
Finally, there are many other JFR events available for collecting data about the resource consumption, GC activity, etc. With all the infrastructure we already created, adding support for additional events should be straightforward.