-
Notifications
You must be signed in to change notification settings - Fork 922
Description
Describe the bug
Hi guys,
I love async-profiler, so thank you all sincerely for the work you do!
Anyway, to the bug report :) When I profile the Renaissance sub-benchmark akka-uct on Amazon EC2 m6a.metal instances, I pretty repeatably get a HotSpot SIGSEGV (file at hs_err_pid4147.log):
C [libasyncProfiler.so+0x18109] PerfEvents::walk(int, void*, void const**, int, StackContext*) [clone .constprop.1221]+0x29
It happens on other instance types too (m6i.metal, m7a.metal, m7g.metal, m7i.metal), so it affects both x86 and arm64. But it's most common on the m6a.metal instances, which are AMD EPYC 7R13 machines.
The error is somewhat intermittent. When I run it on most instance types, it might fail once or twice, but it'll eventually work. On m6a.metal, it's never run to completion for me.
I've tried to give a very complete description below, but please let me know if you need any more info.
Thanks!
Expected vs. actual behavior
Expected behavior is for the profile to run to completion. Actual behavior is SIGSEGV after some amount of time. The unsuccessful output of the benchmark looks like this:
====== akka-uct (concurrency) [default], iteration 0 started ======
GC before operation: completed in 24.973 ms, heap usage 62.562 MB -> 11.223 MB.
final size= 199991
final height= 9
final size= 199991
final height= 9
final size= 199991
final height= 9
final size= 199991
final height= 9
final size= 199991
final height= 9
final size= 199991
final height= 9
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x000079116ac9d109, pid=20286, tid=22946
#
# JRE version: OpenJDK Runtime Environment Temurin-24.0.1+9 (24.0.1+9) (build 24.0.1+9)
# Java VM: OpenJDK 64-Bit Server VM Temurin-24.0.1+9 (24.0.1+9, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C [libasyncProfiler.so+0x18109] PerfEvents::walk(int, void*, void const**, int, StackContext*) [clone .constprop.1221]+0x29
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E" (or dumping to /home/ubuntu/core.20286)
#
# An error report file with more information is saved as:
# /home/ubuntu/hs_err_pid20286.log
#
# If you would like to submit a bug report, please visit:
# https://github.com/adoptium/adoptium-support/issues
Successful output looks like this:
====== akka-uct (concurrency) [default], iteration 0 started ======
GC before operation: completed in 16.653 ms, heap usage 56.787 MB -> 8.097 MB.
final size= 199991
final height= 10
final size= 199991
final height= 9
final size= 199991
final height= 9
final size= 199991
final height= 9
final size= 199991
final height= 9
final size= 199991
final height= 9
final size= 199991
final height= 9
final size= 199991
final height= 9
final size= 199991
final height= 9
final size= 199991
final height= 9
====== akka-uct (concurrency) [default], iteration 0 completed (12863.032 ms) ======
< successful iterations deleted >
====== akka-uct (concurrency) [default], iteration 11 started ======
GC before operation: completed in 321.469 ms, heap usage 4.100 GB -> 103.590 MB.
final size= 199991
final height= 9
final size= 199991
final height= 9
final size= 199991
final height= 9
final size= 199991
final height= 9
final size= 199991
final height= 9
final size= 199991
final height= 9
final size= 199991
final height= 9
final size= 199991
final height= 9
final size= 199991
final height= 9
final size= 199991
final height= 9
====== akka-uct (concurrency) [default], iteration 11 completed (10860.299 ms) ======
Reproduction Steps
Here are the reproduction instructions, starting from a totally empty new EC2 instance. It may also work on an already-set-up Ubuntu machine, but I haven't tried that.
- Provision a fresh Amazon EC2
m6a.metal
instance with the standard Ubuntu 24.04 AMI. Give it an EBS volume of 20 GB or so, to hold the JDK and benchmark we're going to install on it. - Copy this setup script to the machine: set-up-instance.txt
- Log into the machine, type
mv set-up-instance.txt set-up-instance.sh
, thensource set-up-instance.sh
. This will install Java and async-profiler. source ~/.profile
to get the JDK on your path (the setup script adds a line to the.profile
)- Download the benchmark with
curl -L -O "https://github.com/renaissance-benchmarks/renaissance/releases/download/v0.16.0/renaissance-gpl-0.16.0.jar"
- Run the benchmark with
java -agentpath:/home/ubuntu/async-profiler-4.0-linux-x64/lib/libasyncProfiler.so=start,event=cpu,file=profile.html -jar renaissance-gpl-0.16.0.jar akka-uct -t 120 --json results.json
- Observe the failure :) If you don't see it at first, try running again, or increasing
-t 120
to a larger value to run longer.
Additional Information/Context
No response
Async-profiler version
4.0
Environment details
OS: Ubuntu 24.04, using Amazon's standard image at the path /aws/service/canonical/ubuntu/server/24.04/stable/current/
JDK: OpenJDK24U-jdk_x64_linux_hotspot_24.0.1_9
CPU: AMD EPYC 7R13 48-Core Processor (so amd64)
The application is running on an AWS EC2 metal instance, not in a container.