+
Skip to content

Conversation

KJTsanaktsidis
Copy link
Contributor

@KJTsanaktsidis KJTsanaktsidis commented Aug 16, 2024

If we don't do this, callbacks registered before fork will never actually fire after forking, and just hang forever.

We explicitly re-initialize the entire context (including re-initializing the mutexes and clearing out the list of callbacks), because:

  • If a thread in the parent was in the process of calling a callback when we fork, the mutex would otherwise be "stuck on"
  • I don't think we want to actually execute any pending callbacks from other threads in the child process, so clearing the list makes sense anyway.

Fixes: #1114

@KJTsanaktsidis KJTsanaktsidis force-pushed the ktsanaktsidis/restart_cbthread_after_fork branch 3 times, most recently from 8576d92 to 8bdfaf7 Compare August 16, 2024 03:16
@KJTsanaktsidis
Copy link
Contributor Author

:( I accidently pushed a few versions of this PR which hang forever, and github seems to be waiting for all the previous enqueued jobs to time out (after six hours by default! :o) before running the tests on the latest version of this PR. So I guess I'm going to have to wait till tomorrow to get the test results here.

@KJTsanaktsidis KJTsanaktsidis force-pushed the ktsanaktsidis/restart_cbthread_after_fork branch from 8bdfaf7 to 7bc583d Compare August 16, 2024 07:42
If we don't do this, callbacks registered before fork will never
actually fire after forking, and just hang forever.

We explicitly re-initialize the entire context (including
re-initializing the mutexes and clearing out the list of callbacks),
because:

* If a thread in the parent was in the process of calling a callback
  when we fork, the mutex would otherwise be "stuck on"
* I don't think we _want_ to actually execute any pending callbacks from
  other threads in the child process, so clearing the list makes sense
  anyway.
@KJTsanaktsidis KJTsanaktsidis force-pushed the ktsanaktsidis/restart_cbthread_after_fork branch from 7bc583d to 1f4989f Compare August 16, 2024 07:46
@larskanis
Copy link
Member

I canceled the CI jobs. Looks good now!

@KJTsanaktsidis
Copy link
Contributor Author

Thanks! Looks like I ticked all the old rubies the right way, I think

Comment on lines +225 to +231
/* n.b. we _used_ to try and destroy the mutex/cond before initializing here,
* but it's undefined what happens if you try and destory an unitialized cond.
* glibc in particular seems to wait for any concurrent waiters to finish before
* destroying a condvar, trying to destroy a condvar after fork that someone was
* waiting on pre-fork won't work. Just re-init he memory directly. */
pthread_mutex_init(&ctx->async_cb_mutex, NULL);
pthread_cond_init(&ctx->async_cb_cond, NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I wonder if, given the comment, it would be cleaner to use

ctx->async_cb_mutex = PTHREAD_MUTEX_INITIALIZER;
ctx->async_cb_cond = PTHREAD_COND_INITIALIZER;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess they're equivalent, yeah. I can change this if we prefer.

@beauraF
Copy link

beauraF commented Sep 10, 2024

Hello 👋

Just to let you know that we have deploy this branch in our test environment and this fix a fork safety issue we had with one of our internal library that use FFI, without any other side effects as of today

Thanks a lot @KJTsanaktsidis 🙏

@mensfeld
Copy link

@beauraF you can use the karafka rdkafka fix meanwhile (rebuild the needed classes post-fork) to mitigate this.

FYI I am also waiting on this.

CaptainAchab pushed a commit to TankerHQ/sdk-ruby that referenced this pull request Sep 12, 2024
We're waiting on a PR on the upstream Ruby FFI repo for fork-safety:
ffi/ffi#1117

This simple workaround also prevents the issue,
so we don't have to wait.
@beauraF
Copy link

beauraF commented Sep 13, 2024

without any other side effects as of today

I'd like to come back on this, one of our CI pipelines is blocked when we use this branch. Unfortunately I haven't had time to investigate further yet. No issues with @mensfeld workaround

@larskanis
Copy link
Member

I'd like to come back on this, one of our CI pipelines is blocked when we use this branch.

Thank you @KJTsanaktsidis for this great work! I think this can be merged, but the blocking CI should be investigated first. @beauraF It would be great if you can investigate it, or you can provide more details.

I didn't notice any side effects on my own, but I don't have any use cases with fork() other than the ffi specs.

@KJTsanaktsidis
Copy link
Contributor Author

Yeah @beauraF if you're able to provide some details, even if not a full reproduction, I can have a little stare at the code and think about what might be happening.

@beauraF
Copy link

beauraF commented Sep 18, 2024

@beauraF It would be great if you can investigate it, or you can provide more details.

Yeah @beauraF if you're able to provide some details, even if not a full reproduction, I can have a little stare at the code and think about what might be happening.

So.. I just investigated, and the issue was fully on our side. Sorry about that..
I confirm this is fixing our issue. Thanks again 🙏

@KJTsanaktsidis
Copy link
Contributor Author

Great news, thanks for confirming!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue with post-fork callback execution (FFI Callback Dispatcher not restarted)

5 participants

点击 这是indexloc提供的php浏览器服务,不要输入任何密码和下载