Restart async callback dispatcher thread after fork #1117

KJTsanaktsidis · 2024-08-16T01:53:21Z

If we don't do this, callbacks registered before fork will never actually fire after forking, and just hang forever.

We explicitly re-initialize the entire context (including re-initializing the mutexes and clearing out the list of callbacks), because:

If a thread in the parent was in the process of calling a callback when we fork, the mutex would otherwise be "stuck on"
I don't think we want to actually execute any pending callbacks from other threads in the child process, so clearing the list makes sense anyway.

Fixes: #1114

KJTsanaktsidis · 2024-08-16T03:19:10Z

:( I accidently pushed a few versions of this PR which hang forever, and github seems to be waiting for all the previous enqueued jobs to time out (after six hours by default! :o) before running the tests on the latest version of this PR. So I guess I'm going to have to wait till tomorrow to get the test results here.

If we don't do this, callbacks registered before fork will never actually fire after forking, and just hang forever. We explicitly re-initialize the entire context (including re-initializing the mutexes and clearing out the list of callbacks), because: * If a thread in the parent was in the process of calling a callback when we fork, the mutex would otherwise be "stuck on" * I don't think we _want_ to actually execute any pending callbacks from other threads in the child process, so clearing the list makes sense anyway.

larskanis · 2024-08-16T08:48:12Z

I canceled the CI jobs. Looks good now!

KJTsanaktsidis · 2024-08-16T09:11:21Z

Thanks! Looks like I ticked all the old rubies the right way, I think

ivoanjo · 2024-08-16T10:26:45Z

ext/ffi_c/Function.c

+        /* n.b. we _used_ to try and destroy the mutex/cond before initializing here,
+         * but it's undefined what happens if you try and destory an unitialized cond.
+         * glibc in particular seems to wait for any concurrent waiters to finish before
+         * destroying a condvar, trying to destroy a condvar after fork that someone was
+         * waiting on pre-fork won't work. Just re-init he memory directly. */
+        pthread_mutex_init(&ctx->async_cb_mutex, NULL);
+        pthread_cond_init(&ctx->async_cb_cond, NULL);


Minor: I wonder if, given the comment, it would be cleaner to use

ctx->async_cb_mutex = PTHREAD_MUTEX_INITIALIZER; ctx->async_cb_cond = PTHREAD_COND_INITIALIZER;

I guess they're equivalent, yeah. I can change this if we prefer.

beauraF · 2024-09-10T07:56:05Z

Hello 👋

Just to let you know that we have deploy this branch in our test environment and this fix a fork safety issue we had with one of our internal library that use FFI, without any other side effects as of today

Thanks a lot @KJTsanaktsidis 🙏

mensfeld · 2024-09-10T08:03:25Z

@beauraF you can use the karafka rdkafka fix meanwhile (rebuild the needed classes post-fork) to mitigate this.

FYI I am also waiting on this.

We're waiting on a PR on the upstream Ruby FFI repo for fork-safety: ffi/ffi#1117 This simple workaround also prevents the issue, so we don't have to wait.

beauraF · 2024-09-13T14:22:14Z

without any other side effects as of today

I'd like to come back on this, one of our CI pipelines is blocked when we use this branch. Unfortunately I haven't had time to investigate further yet. No issues with @mensfeld workaround

larskanis · 2024-09-13T19:35:43Z

I'd like to come back on this, one of our CI pipelines is blocked when we use this branch.

Thank you @KJTsanaktsidis for this great work! I think this can be merged, but the blocking CI should be investigated first. @beauraF It would be great if you can investigate it, or you can provide more details.

I didn't notice any side effects on my own, but I don't have any use cases with fork() other than the ffi specs.

KJTsanaktsidis · 2024-09-17T02:03:14Z

Yeah @beauraF if you're able to provide some details, even if not a full reproduction, I can have a little stare at the code and think about what might be happening.

beauraF · 2024-09-18T08:31:34Z

@beauraF It would be great if you can investigate it, or you can provide more details.

Yeah @beauraF if you're able to provide some details, even if not a full reproduction, I can have a little stare at the code and think about what might be happening.

So.. I just investigated, and the issue was fully on our side. Sorry about that..
I confirm this is fixing our issue. Thanks again 🙏

KJTsanaktsidis · 2024-09-18T09:33:19Z

Great news, thanks for confirming!

KJTsanaktsidis force-pushed the ktsanaktsidis/restart_cbthread_after_fork branch 3 times, most recently from 8576d92 to 8bdfaf7 Compare August 16, 2024 03:16

KJTsanaktsidis closed this Aug 16, 2024

KJTsanaktsidis reopened this Aug 16, 2024

KJTsanaktsidis force-pushed the ktsanaktsidis/restart_cbthread_after_fork branch from 8bdfaf7 to 7bc583d Compare August 16, 2024 07:42

KJTsanaktsidis force-pushed the ktsanaktsidis/restart_cbthread_after_fork branch from 7bc583d to 1f4989f Compare August 16, 2024 07:46

ivoanjo reviewed Aug 16, 2024

View reviewed changes

mensfeld approved these changes Aug 19, 2024

View reviewed changes

larskanis merged commit 5df52c0 into ffi:master Sep 19, 2024

tagliala mentioned this pull request Jan 2, 2025

Please add changelog and push tag for 1.17.1 #1135

Closed

larskanis mentioned this pull request Jan 10, 2025

Fork method and Phusion Passenger #1137

Open

joshuay03 mentioned this pull request Jul 24, 2025

Mark callback dispatcher thread as fork safe for Puma #1156

Open

Restart async callback dispatcher thread after fork #1117

Restart async callback dispatcher thread after fork #1117

Uh oh!

Conversation

KJTsanaktsidis commented Aug 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KJTsanaktsidis commented Aug 16, 2024

Uh oh!

larskanis commented Aug 16, 2024

Uh oh!

KJTsanaktsidis commented Aug 16, 2024

Uh oh!

ivoanjo Aug 16, 2024

Choose a reason for hiding this comment

Uh oh!

KJTsanaktsidis Aug 18, 2024

Choose a reason for hiding this comment

Uh oh!

beauraF commented Sep 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mensfeld commented Sep 10, 2024

Uh oh!

beauraF commented Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

larskanis commented Sep 13, 2024

Uh oh!

KJTsanaktsidis commented Sep 17, 2024

Uh oh!

beauraF commented Sep 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KJTsanaktsidis commented Sep 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

KJTsanaktsidis commented Aug 16, 2024 •

edited

Loading

beauraF commented Sep 10, 2024 •

edited

Loading

beauraF commented Sep 13, 2024 •

edited

Loading

beauraF commented Sep 18, 2024 •

edited

Loading