-
Notifications
You must be signed in to change notification settings - Fork 116
#721 Fix publisher negotiation reliability by preventing overlapping renegotiations #797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
#721 Fix publisher negotiation reliability by preventing overlapping renegotiations #797
Conversation
🦋 Changeset detectedLatest commit: 405373a The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
f157148 to
50f86d5
Compare
| // Best-effort wait for publisher to connect before releasing mutex | ||
| // to avoid overlapping renegotiations. | ||
| withTimeoutOrNull(MAX_ICE_CONNECT_TIMEOUT_MS.toLong()) { | ||
| publisherObserver.waitUntilConnected() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, there's a change that's on my todo list that should ensure we ignore offers/answers that come from old negotiations (server offer/answers now come with ids associated with them). I wonder if that's the missing piece here that will alleviate the issue of overlapping renegotiations.
My main concern here is that waiting for each negotiation to finish each time will cause the overall connection time to increase quite a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidliu - fair to be careful about this. I analyzed the code flow to understand the actual impact:
The mutex + waitUntilConnected() is only held while the publisher PeerConnection is not yet in CONNECTED. Once it's connected, additional negotiatePublisher() calls return immediately, so normal renegotiations don't pay extra latency.
Where your concern is valid is in a failure + soft reconnect path: if the initial negotiation fails and the publisher goes to FAILED, the current code still holds the mutex until the 20s timeout, so the first soft reconnect attempt can't start its own negotiation until that timeout elapses. That can add up to MAX_ICE_CONNECT_TIMEOUT_MS (~20s) to the reconnection time in that scenario.
A targeted fix would be to have waitUntilConnected() return not only when the connection becomes CONNECTED, but also when it becomes "disconnected" (FAILED/CLOSED). That way we still serialize negotiations, but we don't delay reconnect: the mutex is released promptly on failure and the reconnect negotiation can start immediately. The negotiation-ID filtering you mentioned would complement this, but doesn't by itself remove the extra wait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@davidliu Quick follow-up on this after digging deeper into the flow and your latency concern.
Based on the analysis I mentioned earlier, I updated the implementation to address the failure + soft reconnect case without giving up the “no overlapping renegotiations” guarantee:
- The mutex is still only used to serialize publisher negotiations, but the wait helper is now
waitUntilConnectedOrDisconnected(), which returns as soon as the publisherPeerConnectionis either CONNECTED or has definitively terminated (FAILED/CLOSED). - In the normal / steady state (publisher already connected), additional
negotiatePublisher()calls effectively don’t wait — the first emission is already CONNECTED, so renegotiations don’t pay extra latency. - In the failure + soft reconnect scenario you called out, we no longer hold the mutex for the full
MAX_ICE_CONNECT_TIMEOUT_MSwhen the connection has already failed. The helper exits as soon as the publisher reaches a terminal failure state, so the soft reconnect can immediately start a fresh negotiation instead of being blocked for ~20s. DISCONNECTEDis still treated as transient (per the existingisDisconnected()extension), so we keep waiting through temporary drops where ICE can still recover.
So the behavior now is: serialize negotiations while the connection is in a transient state, but release the lock promptly once it’s either successfully connected or clearly dead. Your planned change to use negotiation IDs on offers/answers will still be a nice complementary layer on the signaling side; this PR just tightens the local sequencing so we don’t introduce avoidable reconnect latency in the failure path.
248f106 to
e85dfcb
Compare
e85dfcb to
429f14a
Compare
429f14a to
e9153df
Compare
0a3bb58 to
96a5cf8
Compare
96a5cf8 to
405373a
Compare
Follow-up: fixes #721
PR #789 added a mutex but used
tryLock(), which released the mutex before connection completed, still allowing overlapping renegotiations (sorry, @davidliu).Fix
withLock()instead oftryLock()so negotiation attempts wait instead of being skippedhasPublished = truebefore theclient.isConnectedcheck for proper reconnect handling@VolatiletohasPublishedfor thread-safe memory visibilitywaitUntilConnectedOrDisconnected), so failed negotiations don't delay reconnect.