Method Tracing #1497

apangin · 2025-09-19T03:34:35Z

apangin
Sep 19, 2025
Maintainer

Background

Async-profiler is often described as a low-overhead sampling profiler for Java. Sampling means the profiler collects statistical information about program execution by periodically interrupting it to get stack traces. While this approach has minimal performance impact on the monitored application, the main trade-off is its inability to get an exact count of method invocations or duration of each invocation.

If a method is missing in a sampling profile, it does not mean this method has never been executed. In the cases when it is important to know the exact number of calls or to discover if a particular code path is ever taken, instrumenting profilers come to the rescue. In this post, I will showcase method tracing capabilities of async-profiler, including the recently added latency profiling feature.

Basic instrumentation

Async-profiler has had instrumentation capabilities since version 1.7 released back in 2020. You can specify a Java method to instrument with -e option, and all invocations of the given method will be recorded. This may be either a JDK internal method or any application method, no matter public or private.

Let's say, we are chasing a cause of mysterious Full GC in logs:

[2515.127s][info][gc] GC(50) Pause Full (System.gc()) 817M->29M(156M) 28.328ms

Looks as if someone is calling System.gc, and async-profiler can tell where it is called from exactly:

asprof -e java.lang.System.gc -f gc.html AppName

This command collects stack traces of all System.gc and summarizes them on a flame graph.

Surprisingly, System.gc is called right in the OpenJDK core library under certain out-of-memory conditions. By the way, to trace the source of such out-of-memory errors, you can use the same async-profiler feature: just instrument the OutOfMemoryError constructor:

asprof -e "java.lang.OutOfMemoryError.<init>" --cstack vmx -f oom.html AppName

Note --cstack vmx option: it enables the most detailed stack walking mode, showing all Java and native frames interleaved. In this example, OutOfMemoryError is thrown from the native implementation of the JDK internal map0 method.

You can use * wildcard to profile more than one method at a time.
For instance, -e java.io.FileInputStream.read* will match read, readNBytes and readAllBytes methods.
More generally, -e java.io.FileInputStream.* will instrument all methods of the FileInputStream class.

What other types of instrumentation async-profiler supports, find in my Devoxx UK talk.

Latency tracing

Although Java method profiling feature has existed for a long time, it could only trace method invocations, but not duration of calls. latency option introduced in #1421, closes this gap. It is already available in Nightly builds.

As before, -e argument (or event when starting profiler as an agent) selects a method to instrument. Now, with the latency option, async-profiler will record the duration of each call. When combined with total option, flame graph accumulates total duration instead of the invocation count.

Here is an example how to count total time spent inside ByteBuffer.allocateDirect:

java -agentpath:libasyncProfiler.so=start,event=java.nio.ByteBuffer.allocateDirect,latency,total,file=duration.html MyApp

To get the most out of this feature, use latency profiling with JFR output format. In this case, async-profiler records every single invocation of the given method along with the event timestamp, duration and a stack trace. You can then analyze the recording in JDK Mission Control (JMC) on the Event Browser tab - look for Method Trace events:

But what if the method of interest is called millions of times? To simplify analysis and, importantly, to reduce performance overhead of method tracing, you can filter only those method calls that run longer than the specified amount of time - just pass the duration threshold as latency argument.

For example, we want to find usages of ArrayList.contains, which is generally discouraged because of its linear complexity. However, there is no problem with this method at all when a list contains only a few elements. Our goal will be to find ArrayList.contains calls that take longer than 50 microseconds:

asprof -e java.util.ArrayList.contains --latency 50us -f latency.jfr PID

Opening profile in JMC, we see a handful of Method Trace events and can quickly dive deep into the longest ones.

On this screenshot, the slowest ArrayList.contains appears to be called from AbstractSet.removeAll. This is not a unique problem, actually. Depending on the relative size of collections, removeAll implementation may iterate over the original set or over the specified collection. If you are unlucky and a list is larger than the set, the algorithm complexity degrades to O(N*M). If you ever experienced the same issue, the fix would be to replace set.removeAll(list) with list.forEach(set::remove).

Performance implications

Now that you've seen the capabilities of Method Tracing, I anticipate the question: what about overhead? Is it safe to enable tracing in production?

In short, Method Tracing was designed with high performance in mind. The feature relies on the bytecode instrumentation: for every traced method, async-profiler inserts a timestamp call at the beginning of the method and another call at every exit point. Such instrumentation does not prevent the method from being JIT-compiled, it typically adds just a few nanoseconds. When the method duration exceeds the latency threshold, a JNI call is made to collect a stack trace and to record the event. This is the most expensive part of tracing. Hence the rule: choose latency parameter wisely to avoid excessive tracing of short-running methods. For an average application, recording about 100 calls per second does not cause any noticeable performance impact.

Another approach to reduce the overhead is to set non-zero interval. E.g., interval=20 means async-profiler will record only one of every 20 method calls. This can be useful to collect a latency histogram without paying penatly on every invocation.

Method Tracing in JDK

Async-profiler supports method tracing with all JDK versions starting from 8.

In the meantime, JDK 25 comes with a new built-in feature that looks very similar at first glance.
JEP 520: JFR Method Timing & Tracing adds two JFR events: jdk.MethodTrace and jdk.MethodTiming. In fact, async-profiler adopted the same event for method traces, aiming to become compatible with external tools once they support this new feature. There are several important differences, though.

In JEP 520, stack trace of the jdk.MethodTrace event starts from the caller method. I deem it an unfortunate design flaw to omit the traced method itself, and that is why.

Besides just method ID, StackFrame carries a bunch of additional details, such as frame type. At different times, the traced method can be interpreted, compiled or inlined. Discovering its compilation status is useful to understand a possible cause of latency outliers, which may be related, for instance, to deoptimization issues.
If the target method has multiple exits, it is good to know which one is taken exactly. For example, this allows to compare latency of the "cache hit" path with the "cache miss" branch. Again, method alone is not enough to do so, as opposed to StackFrame which has lineNumber and bytecodeIndex fields.
With the traced method included in a stack trace, Flame Graph visualization works out of the box. Without it, tool developers need to provide a special case for jdk.MethodTrace event and append a synthetic frame with the traced method themselves.

For the above reasons, we decided to include the target method in async-profiler's stack traces, so that additional information becomes immediately visible in JMC:

Like many other JFR events, jdk.MethodTrace has a few settings that can be tuned via the command line or in .jfc configuration file. JEP 520 mentions threshold intended to trace only methods taking longer than the specified amount of time. In practice, however, changing threshold has no effect, indicating the setting has not been implemented yet. This means, JFR Method Tracing is not currently suitable for high volume events.

There is one more interesting peculiarity of JDK's method traces. They treat all throw instructions as exit points, even if exception is caught inside the same method. Consequently, all invocations of such methods will be counted twice:

void doubleCounted() {
    System.out.println("enter");
    try {
        throw new RuntimeException();
    } catch (RuntimeException e) {
        // handle exception in the same method
    }
    System.out.println("exit");
}

I submitted OpenJDK issues for both problems: JDK-8367948, JDK-8367949.

Notwithstanding this, I'm excited to see OpenJDK development move in the same direction. This only emphasizes the relevance of Java observability tooling that is comprised of both sampling and instrumenting profilers.

Conclusion

Method Tracing in async-profiler complements traditional sampling by providing exact invocation counts, call paths, and latency insights, all with carefully designed performance safeguards. With some caveats, JDK Flight Recorder also offers Method Tracing and Timing feature since JDK 25.

Downloads

Toparvion · 2025-09-19T04:11:56Z

Toparvion
Sep 19, 2025

@apangin , thanks for this great post and the work behind it!

A question about this note:

Note --cstack vmx option: it enables the most detailed stack walking mode, showing all Java and native frames interleaved.

As an async-profiler user, how do I understand that this particular profiling session needs this option? Should I notice some signs of its applicability, or maybe just use it if the defaults don't work?

1 reply

apangin Sep 19, 2025
Maintainer Author

A distinctive feature of vmx mode is that it shows intermediate native frames. Other modes display native frames only on top of the stack, before the first Java frame. Usually, this extra information is too noisy, so it's not enabled by default.
In this particular case, other modes would show OutOfMemoryError.<init> right above FileDispatcherImpl.map0. Given that map0 is a native method that has no bytecode, it's not obvious how it causes OutOfMemoryError. --cstack vmx adds some details what happens between these frames.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Method Tracing #1497

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Method Tracing #1497

Uh oh!

Uh oh!

apangin Sep 19, 2025 Maintainer

Background

Basic instrumentation

Latency tracing

Performance implications

Method Tracing in JDK

Conclusion

Downloads

Replies: 1 comment · 1 reply

Uh oh!

Toparvion Sep 19, 2025

Uh oh!

apangin Sep 19, 2025 Maintainer Author

apangin
Sep 19, 2025
Maintainer

Replies: 1 comment 1 reply

Toparvion
Sep 19, 2025

apangin Sep 19, 2025
Maintainer Author