Tags: firezone/firezone
Tags
fix(connlib): treat `ENOBUFS` as `EWOULDBLOCK` (#9798) Socket APIs across operating systems vary in how they handle back-pressure. In most cases, a non-blocking socket should return `EWOULDBLOCK` when it cannot send a given datagram and would have to block to wait for resources to free up. It appears that macOS doesn't always behave like that. In particular, we are seeing error logs from a few users where sending a datagram fails with > No buffer space available (os error 55) Digging through `libc`, I've found that this error is known as `ENOBUFS` [0]. There are reports on the Apple developer forum [1] that recommend retrying when this error happens. It is however unclear as to whether it is entirely safe to map this error to `EWOULDBLOCK`. Other non-blocking event-loop implementations [2] appear to do that but we don't know whether it is fully correct. At present, Firezone's behaviour here is to drop the packet. This means the host networking stack has to fall-back to running into a timeout and re-send the packet. This very likely negatively impacts the UX for the users hitting this. In order to validate this assumption, we implement a feature-flag. This allows us to ship this code but switch back to the old behaviour, should it negatively impact how Firezone behaves. In particular, if the assumption that mapping `ENOBUFS` to `EWOULDBLOCK` is safe turns out wrong and `kqueue` does in fact not signal readiness when more buffers are available, then we may have missing wake-ups which would lead a further delay in datagrams being sent. [0]: https://github.com/rust-lang/libc/blob/8e6f36c6ba5c91c150f2d79a06ffdc3d0b83a0f6/src/unix/bsd/apple/mod.rs#L2998 [1]: https://developer.apple.com/forums/thread/42334 [2]: https://github.com/libuv/libuv/blob/aac866f39923670d9c136851f866531f5f6cd889/src/unix/stream.c#L820
feat(portal): add batch-insert to change logs (#9733) Inserting a change log incurs some minor overhead for sending query over the network and reacting to its response. In many cases, this makes up the bulk of the actual time it takes to run the change log insert. To reduce this overhead and avoid any kind of processing delay in the WAL consumers, we introduce batch insert functionality with size `500` and timeout `30` seconds. If either of those two are hit, we flush the batch using `insert_all`. `insert_all` does not use `Ecto.Changeset`, so we need to be a bit more careful about the data we insert, and check the inserted LSNs to determine what to update the acknowledged LSN pointer to. The functionality to determine when to call the new `on_flush/1` callback lives in the replication_connection module, but the actual behavior of `on_flush/1` is left to the child modules to implement. The `Events.ReplicationConnection` module does not use flush behavior, and so does not override the defaults, which is not to use a flush mechanism. Related: #949
ci: retry authentication with GCP (#9786) At present, it appears that `actions/toolkit` has a bug where it isn't always able to correctly fetch an ID token. See actions/toolkit#2098 for the upstream issue. As a result, our CI often fails relatively often. A simple restart usually fixes the issue. This however is annoying because it means PRs get de-queued from the merge-queue or don't queue in the first place and therefore require baby-sitting. To fix this, we attempt to build a retry-mechanism from within the action. Using `continue-on-error`, we tell the "auth" step to continue, even if it fails. Following that, we try to authenticate again but only if the previous one failed. We do this up to 3 times before actually giving up. --------- Co-authored-by: Jamil Bou Kheir <jamilbk@users.noreply.github.com>
fix(android): fix view state lifecycle around tunnel/auth (#9621) `onViewCreated()` is called when the view initializes, and then `onResume()` is called right after, in addition to anytime the view is shown again. To prevent showing the VPN permission activity twice, we remove the `checkTunnelState()` from onViewCreated, allowing only `onResume()` to call it. A boolean flag is added to track whether this is the "first" launch of the app in order to determine whether to `connectOnStart`. Fixes #9584 --------- Signed-off-by: Jamil <jamilbk@users.noreply.github.com>
fix(ci): lock xcode major (#9585) Apple won't allow apps built with Xcode betas to be reviewed. <img width="1146" alt="Screenshot 2025-06-19 at 9 04 17 AM" src="https://github.com/user-attachments/assets/11470f04-603b-4c5c-aad2-fba0e4eb391a" />
fix(portal): ensure sentry reports conditional migrations (#9582) Sentry isn't started when this runs, so start it and manually capture a message to ensure we're reminded about pending conditional migrations. Verified that this works with the Release script.
fix(apple): disable false-positive "App hang" reports (#9568) As recommended by the Sentry team [0], "App hang" tracking should be disabled before calling into certain system APIs like showing alerts to prevent false-positives. [0]: getsentry/sentry-cocoa#3472
PreviousNext