这是indexloc提供的服务,不要输入任何密码
Skip to content

Conversation

@adrian-niculescu
Copy link
Contributor

@adrian-niculescu adrian-niculescu commented Nov 10, 2025

Follow-up: fixes #721

PR #789 added a mutex but used tryLock(), which released the mutex before connection completed, still allowing overlapping renegotiations (sorry, @davidliu).

Fix

  • Use withLock() instead of tryLock() so negotiation attempts wait instead of being skipped
  • Hold the mutex until connection completes (or times out) to prevent overlapping renegotiations
  • Move hasPublished = true before the client.isConnected check for proper reconnect handling
  • Add @Volatile to hasPublished for thread-safe memory visibility
  • Ensure the negotiation mutex only waits until the publisher connection is either connected or definitively failed (waitUntilConnectedOrDisconnected), so failed negotiations don't delay reconnect.

@changeset-bot
Copy link

changeset-bot bot commented Nov 10, 2025

🦋 Changeset detected

Latest commit: 405373a

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
client-sdk-android Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@adrian-niculescu adrian-niculescu force-pushed the fix/publisher-negotiation-first-publish branch 2 times, most recently from f157148 to 50f86d5 Compare November 10, 2025 14:17
@adrian-niculescu adrian-niculescu marked this pull request as draft November 13, 2025 17:19
@adrian-niculescu adrian-niculescu marked this pull request as ready for review November 13, 2025 18:47
// Best-effort wait for publisher to connect before releasing mutex
// to avoid overlapping renegotiations.
withTimeoutOrNull(MAX_ICE_CONNECT_TIMEOUT_MS.toLong()) {
publisherObserver.waitUntilConnected()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, there's a change that's on my todo list that should ensure we ignore offers/answers that come from old negotiations (server offer/answers now come with ids associated with them). I wonder if that's the missing piece here that will alleviate the issue of overlapping renegotiations.

My main concern here is that waiting for each negotiation to finish each time will cause the overall connection time to increase quite a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidliu - fair to be careful about this. I analyzed the code flow to understand the actual impact:

The mutex + waitUntilConnected() is only held while the publisher PeerConnection is not yet in CONNECTED. Once it's connected, additional negotiatePublisher() calls return immediately, so normal renegotiations don't pay extra latency.

Where your concern is valid is in a failure + soft reconnect path: if the initial negotiation fails and the publisher goes to FAILED, the current code still holds the mutex until the 20s timeout, so the first soft reconnect attempt can't start its own negotiation until that timeout elapses. That can add up to MAX_ICE_CONNECT_TIMEOUT_MS (~20s) to the reconnection time in that scenario.

A targeted fix would be to have waitUntilConnected() return not only when the connection becomes CONNECTED, but also when it becomes "disconnected" (FAILED/CLOSED). That way we still serialize negotiations, but we don't delay reconnect: the mutex is released promptly on failure and the reconnect negotiation can start immediately. The negotiation-ID filtering you mentioned would complement this, but doesn't by itself remove the extra wait.

Copy link
Contributor Author

@adrian-niculescu adrian-niculescu Nov 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidliu Quick follow-up on this after digging deeper into the flow and your latency concern.

Based on the analysis I mentioned earlier, I updated the implementation to address the failure + soft reconnect case without giving up the “no overlapping renegotiations” guarantee:

  • The mutex is still only used to serialize publisher negotiations, but the wait helper is now waitUntilConnectedOrDisconnected(), which returns as soon as the publisher PeerConnection is either CONNECTED or has definitively terminated (FAILED/CLOSED).
  • In the normal / steady state (publisher already connected), additional negotiatePublisher() calls effectively don’t wait — the first emission is already CONNECTED, so renegotiations don’t pay extra latency.
  • In the failure + soft reconnect scenario you called out, we no longer hold the mutex for the full MAX_ICE_CONNECT_TIMEOUT_MS when the connection has already failed. The helper exits as soon as the publisher reaches a terminal failure state, so the soft reconnect can immediately start a fresh negotiation instead of being blocked for ~20s.
  • DISCONNECTED is still treated as transient (per the existing isDisconnected() extension), so we keep waiting through temporary drops where ICE can still recover.

So the behavior now is: serialize negotiations while the connection is in a transient state, but release the lock promptly once it’s either successfully connected or clearly dead. Your planned change to use negotiation IDs on offers/answers will still be a nice complementary layer on the signaling side; this PR just tightens the local sequencing so we don’t introduce avoidable reconnect latency in the failure path.

@adrian-niculescu adrian-niculescu force-pushed the fix/publisher-negotiation-first-publish branch 2 times, most recently from 248f106 to e85dfcb Compare November 16, 2025 20:13
@adrian-niculescu adrian-niculescu marked this pull request as draft November 16, 2025 20:22
@adrian-niculescu adrian-niculescu force-pushed the fix/publisher-negotiation-first-publish branch from e85dfcb to 429f14a Compare November 16, 2025 20:49
@adrian-niculescu adrian-niculescu marked this pull request as ready for review November 16, 2025 20:55
@adrian-niculescu adrian-niculescu force-pushed the fix/publisher-negotiation-first-publish branch from 429f14a to e9153df Compare November 16, 2025 20:58
@adrian-niculescu adrian-niculescu force-pushed the fix/publisher-negotiation-first-publish branch from 0a3bb58 to 96a5cf8 Compare November 17, 2025 18:18
@adrian-niculescu adrian-niculescu force-pushed the fix/publisher-negotiation-first-publish branch from 96a5cf8 to 405373a Compare November 17, 2025 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Connection instability when receiving and sending data messages immediately after joining a room

2 participants