这是indexloc提供的服务,不要输入任何密码
Skip to content

Log related crash and bugs #1084

@yanglong1010

Description

@yanglong1010

Describe the bug

  • bug 1, crash
    If use asprof to start and stop, the stop step is to run stop,file=/tmp/asprof.31081.1403,log=/tmp/asprof-log.31081.1403 and Log::open is invoked. Log::open closes the FILE* _file created by start.
    If threads are created frequently, then PerfEvents::createForThread is called frequently. If mmap failed due to /proc/sys/kernel/perf_event_mlock_kb limit (should use non-root user, see Reproduction Steps bellow), then async-profiler prints perf_event mmap failed to log file.
    fprintf to a closed FILE* is undefined. In my case, this leads to crash. The stack trace may be one of the following 3 kinds.
#0  0x00007f9d03360387 in raise () from /lib64/libc.so.6
#1  0x00007f9d03361a78 in abort () from /lib64/libc.so.6
#2  0x00007f9d02bbbb29 in os::abort(bool) () from /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so
#3  0x00007f9d02de2b9c in VMError::report_and_die() () from /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so
#4  0x00007f9d02bc5ce5 in JVM_handle_linux_signal () from /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so
#5  0x00007f9d02bb8bd8 in signalHandler(int, siginfo_t*, void*) () from /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so
#6  0x00007f9cb8943949 in Profiler::segvHandler (signo=11, siginfo=0x7f9cba9e75f0, ucontext=0x7f9cba9e74c0) at src/profiler.cpp:893
#7  <signal handler called>
#8  0x00007f9d03372802 in vfprintf () from /lib64/libc.so.6
#9  0x00007f9d0337d447 in fprintf () from /lib64/libc.so.6
#10 0x00007f9cb893c15a in Log::log (level=LOG_WARN, msg=0x7f9cb898c656 "perf_event mmap failed: %s", args=0x7f9cba9e8cc8) at src/log.cpp:87
#11 0x00007f9cb893c355 in Log::warn (msg=0x7f9cb898c656 "perf_event mmap failed: %s") at src/log.cpp:116
#12 0x00007f9cb893f7dc in PerfEvents::createForThread (this=0x7f9cb8bb09a8 <perf_events>, tid=10811) at src/perfEvents_linux.cpp:612
#13 0x00007f9cb8928bd7 in CpuEngine::onThreadStart () at src/cpuEngine.cpp:47
#14 0x00007f9cb8928b7f in pthread_setspecific_hook (key=1, value=0x7f9c64031000) at src/cpuEngine.cpp:36
#15 0x00007f9d02d916b1 in ThreadLocalStorage::set_thread(Thread*) () from /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so
#16 0x00007f9d02bba9a1 in java_start(Thread*) () from /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so
#17 0x00007f9d03d28ea5 in start_thread (arg=0x7f9cba9e9700) at pthread_create.c:307
#18 0x00007f9d03428b0d in clone () from /lib64/libc.so.6
*** Error in `java': corrupted double-linked list: 0x00007f441802d630 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x7f474)[0x7f44ffc26474]
/lib64/libc.so.6(+0x82c38)[0x7f44ffc29c38]
/lib64/libc.so.6(__libc_malloc+0x4c)[0x7f44ffc2c78c]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(+0x65c4d)[0x7f43ecd7ec4d]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(+0x5646a)[0x7f43ecd6f46a]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(+0x50afc)[0x7f43ecd69afc]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(+0x483a7)[0x7f43ecd613a7]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(+0x43ace)[0x7f43ecd5cace]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(+0x40590)[0x7f43ecd59590]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(+0x10116)[0x7f43ecd29116]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(+0x2f347)[0x7f43ecd48347]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(+0x2e584)[0x7f43ecd47584]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(+0x3039b)[0x7f43ecd4939b]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(+0x30802)[0x7f43ecd49802]
/root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so(Agent_OnAttach+0xe3)[0x7f43ecd55d90]
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so(+0x7c12d9)[0x7f44ff2b62d9]
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so(+0x3328df)[0x7f44fee278df]
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so(+0xb11d32)[0x7f44ff606d32]
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so(+0x942a62)[0x7f44ff437a62]
/lib64/libpthread.so.0(+0x7ea5)[0x7f45005a5ea5]
/lib64/libc.so.6(clone+0x6d)[0x7f44ffca5b0d]
Fatal error: glibc detected an invalid stdio handle
Aborted (core dumped)

Program terminated with signal SIGABRT, Aborted.
#0  0x00007f8104088387 in raise () from /lib64/libc.so.6
[Current thread is 1 (Thread 0x7f7fd99d7700 (LWP 27834))]
(gdb) bt
#0  0x00007f8104088387 in raise () from /lib64/libc.so.6
#1  0x00007f8104089a78 in abort () from /lib64/libc.so.6
#2  0x00007f81040caf67 in __libc_message () from /lib64/libc.so.6
#3  0x00007f81040cb04e in __libc_fatal () from /lib64/libc.so.6
#4  0x00007f81040cb3c3 in _IO_vtable_check () from /lib64/libc.so.6
#5  0x00007f810409aa9b in vfprintf () from /lib64/libc.so.6
#6  0x00007f81040a5447 in fprintf () from /lib64/libc.so.6
#7  0x00007f7fd8152990 in Log::log(LogLevel, char const*, __va_list_tag*) () from /root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so
#8  0x00007f7fd8152d93 in Log::warn(char const*, ...) () from /root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so
#9  0x00007f7fd8153ed6 in PerfEvents::createForThread(int) () from /root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so
#10 0x00007f7fd814900e in pthread_setspecific_hook(unsigned int, void const*) () from /root/ap/async-profiler/build/bin/../lib/libasyncProfiler.so
#11 0x00007f8103ab96b1 in ThreadLocalStorage::set_thread(Thread*) () from /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so
#12 0x00007f81038e29a1 in java_start(Thread*) () from /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.412.b08-1.el7_9.x86_64/jre/lib/amd64/server/libjvm.so
#13 0x00007f8104a50ea5 in start_thread (arg=0x7f7fd99d7700) at pthread_create.c:307
#14 0x00007f8104150b0d in clone () from /lib64/libc.so.6
  • bug 2, log loss
    The log file is unlinked in the end of start. If threads are created frequently and mmap failed, we still fprintf to the unlinked file, this leads to log loss, better to stdout or stderr.
lsof -p 1403|grep lo
java    1403 admin    7w      REG              253,1    309493    393558 /tmp/asprof-log.5813.1403 (deleted)
  • issue 3, the description of /proc/sys/kernel/perf_event_mlock_kb is not accurate. This config is per cpu core limit, not the total. Better one is 8 * threads / cpu-core-number. see perf-security

Expected vs. actual behavior

  • Expected: not crash and no log loss
  • Actual: high probability of crash, log loss

Reproduction Steps

  • switch to root user, run echo 64 > /proc/sys/kernel/perf_event_mlock_kb (this step can be skipped if perf_event_mlock_kb is not very large. change to small value can reproduce more quickly)
  • switch to non-root linux user and start Java (the code creates threads frequently)
  • label (for latter use)
  • asprof start -e cpu jps
  • asprof stop jps
  • goto label

Additional Information/Context

I would like to fix this issue, the PR is preparing.

  • the crash, use read/write lock to protect FILE* _file
  • the log loss, we can detect and write to stdout
  • the config, modify the description

If you have better ideas or design, I will be very grateful.

Async-profiler version

latest

Environment details

CentOS Linux release 7.9.2009 (Core)
I think the issue can be reproduced on all Java versions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions