-
Notifications
You must be signed in to change notification settings - Fork 922
Closed
Labels
Description
Describe the bug
- bug 1, crash
If use asprof to start and stop, the stop step is to runstop,file=/tmp/asprof.31081.1403,log=/tmp/asprof-log.31081.1403
andLog::open
is invoked.Log::open
closes the FILE* _file created bystart
.
If threads are created frequently, thenPerfEvents::createForThread
is called frequently. Ifmmap
failed due to/proc/sys/kernel/perf_event_mlock_kb
limit (should use non-root user, see Reproduction Steps bellow), then async-profiler printsperf_event mmap failed
to log file.
fprintf
to a closedFILE*
is undefined. In my case, this leads to crash. The stack trace may be one of the following 3 kinds.
#0 0x00007f9d03360387 in raise () from /lib64/libc.so.6
#1 0x00007f9d03361a78 in abort () from /lib64/libc.so.6
#2 0x00007f9d02bbbb29 in os::abort(bool) () from /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so
#3 0x00007f9d02de2b9c in VMError::report_and_die() () from /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so
#4 0x00007f9d02bc5ce5 in JVM_handle_linux_signal () from /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so
#5 0x00007f9d02bb8bd8 in signalHandler(int, siginfo_t*, void*) () from /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so
#6 0x00007f9cb8943949 in Profiler::segvHandler (signo=11, siginfo=0x7f9cba9e75f0, ucontext=0x7f9cba9e74c0) at src/profiler.cpp:893
#7 <signal handler called>
#8 0x00007f9d03372802 in vfprintf () from /lib64/libc.so.6
#9 0x00007f9d0337d447 in fprintf () from /lib64/libc.so.6
#10 0x00007f9cb893c15a in Log::log (level=LOG_WARN, msg=0x7f9cb898c656 "perf_event mmap failed: %s", args=0x7f9cba9e8cc8) at src/log.cpp:87
#11 0x00007f9cb893c355 in Log::warn (msg=0x7f9cb898c656 "perf_event mmap failed: %s") at src/log.cpp:116
#12 0x00007f9cb893f7dc in PerfEvents::createForThread (this=0x7f9cb8bb09a8 <perf_events>, tid=10811) at src/perfEvents_linux.cpp:612
#13 0x00007f9cb8928bd7 in CpuEngine::onThreadStart () at src/cpuEngine.cpp:47
#14 0x00007f9cb8928b7f in pthread_setspecific_hook (key=1, value=0x7f9c64031000) at src/cpuEngine.cpp:36
#15 0x00007f9d02d916b1 in ThreadLocalStorage::set_thread(Thread*) () from /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so
#16 0x00007f9d02bba9a1 in java_start(Thread*) () from /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so
#17 0x00007f9d03d28ea5 in start_thread (arg=0x7f9cba9e9700) at pthread_create.c:307
#18 0x00007f9d03428b0d in clone () from /lib64/libc.so.6
*** Error in `java': corrupted double-linked list: 0x00007f441802d630 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7f474)[0x7f44ffc26474]
/lib64/libc.so.6(+0x82c38)[0x7f44ffc29c38]
/lib64/libc.so.6(__libc_malloc+0x4c)[0x7f44ffc2c78c]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(+0x65c4d)[0x7f43ecd7ec4d]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(+0x5646a)[0x7f43ecd6f46a]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(+0x50afc)[0x7f43ecd69afc]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(+0x483a7)[0x7f43ecd613a7]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(+0x43ace)[0x7f43ecd5cace]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(+0x40590)[0x7f43ecd59590]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(+0x10116)[0x7f43ecd29116]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(+0x2f347)[0x7f43ecd48347]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(+0x2e584)[0x7f43ecd47584]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(+0x3039b)[0x7f43ecd4939b]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(+0x30802)[0x7f43ecd49802]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(Agent_OnAttach+0xe3)[0x7f43ecd55d90]
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so(+0x7c12d9)[0x7f44ff2b62d9]
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so(+0x3328df)[0x7f44fee278df]
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so(+0xb11d32)[0x7f44ff606d32]
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so(+0x942a62)[0x7f44ff437a62]
/lib64/libpthread.so.0(+0x7ea5)[0x7f45005a5ea5]
/lib64/libc.so.6(clone+0x6d)[0x7f44ffca5b0d]
Fatal error: glibc detected an invalid stdio handle
Aborted (core dumped)
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f8104088387 in raise () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7f7fd99d7700 (LWP 27834))]
(gdb) bt
#0 0x00007f8104088387 in raise () from /lib64/libc.so.6
#1 0x00007f8104089a78 in abort () from /lib64/libc.so.6
#2 0x00007f81040caf67 in __libc_message () from /lib64/libc.so.6
#3 0x00007f81040cb04e in __libc_fatal () from /lib64/libc.so.6
#4 0x00007f81040cb3c3 in _IO_vtable_check () from /lib64/libc.so.6
#5 0x00007f810409aa9b in vfprintf () from /lib64/libc.so.6
#6 0x00007f81040a5447 in fprintf () from /lib64/libc.so.6
#7 0x00007f7fd8152990 in Log::log(LogLevel, char const*, __va_list_tag*) () from /root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so
#8 0x00007f7fd8152d93 in Log::warn(char const*, ...) () from /root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so
#9 0x00007f7fd8153ed6 in PerfEvents::createForThread(int) () from /root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so
#10 0x00007f7fd814900e in pthread_setspecific_hook(unsigned int, void const*) () from /root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so
#11 0x00007f8103ab96b1 in ThreadLocalStorage::set_thread(Thread*) () from /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so
#12 0x00007f81038e29a1 in java_start(Thread*) () from /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so
#13 0x00007f8104a50ea5 in start_thread (arg=0x7f7fd99d7700) at pthread_create.c:307
#14 0x00007f8104150b0d in clone () from /lib64/libc.so.6
- bug 2, log loss
The log file is unlinked in the end of start. If threads are created frequently and mmap failed, we still fprintf to the unlinked file, this leads to log loss, better to stdout or stderr.
lsof -p 1403|grep lo
java 1403 admin 7w REG 253,1 309493 393558 /tmp/asprof-log.5813.1403 (deleted)
- issue 3, the description of
/proc/sys/kernel/perf_event_mlock_kb
is not accurate. This config is per cpu core limit, not the total. Better one is8 * threads / cpu-core-number
. see perf-security
Expected vs. actual behavior
- Expected: not crash and no log loss
- Actual: high probability of crash, log loss
Reproduction Steps
- switch to root user, run
echo 64 > /proc/sys/kernel/perf_event_mlock_kb
(this step can be skipped if perf_event_mlock_kb is not very large. change to small value can reproduce more quickly) - switch to non-root linux user and start Java (the code creates threads frequently)
- label (for latter use)
- asprof start -e cpu jps
- asprof stop jps
- goto label
Additional Information/Context
I would like to fix this issue, the PR is preparing.
- the crash, use read/write lock to protect FILE* _file
- the log loss, we can detect and write to stdout
- the config, modify the description
If you have better ideas or design, I will be very grateful.
Async-profiler version
latest
Environment details
CentOS Linux release 7.9.2009 (Core)
I think the issue can be reproduced on all Java versions.