-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Conditional Focus
Problem
When an application starts capturing a display-surface, the user agent faces a decision - should the captured display-surface be brought to the forefront, or should the capturing application retain focus.
The user agent is mostly agnostic of the nature of the capturing and captured applications, and is therefore ill-positioned to make an informed decision.
In contrast, the capturing application is familiar with its own properties, and is therefore better suited to make this decision. Moreover, by reading displaySurface and/or using Capture Handle, the capturing application can learn about the captured display-surface, driving an even more informed decision.
Sample Use Case
For example, a video conferencing application may wish to:
- Focus a captured application that users typically interact with during the call, like a text editor.
- Retain for itself focus when the captured display-surface is non-interactive content, like a video.
- (Using Capture Handle, the capturing application may even allow the user to remotely start/pause the video.)
Proposed Solution
- Recall that getDisplayMedia() returns a Promise<MediaStream>, and that said MediaStream is guaranteed to contain at least one video track.
- When getDisplayMedia() is called and the user approves the capture of a tab or a window, the video track will be of type
FocusableMediaStreamTrack
, subclassing MediaStreamTrack. FocusableMediaStreamTrack
exposes afocus()
method.- This method may only be called on the microtask on which the aforementioned Promise was resolved. Later invocations of
focus()
produce an error. - Calls to
focus()
that occur more than 1s after capture started are silently ignored, preventing the application from performing a busy-wait on the aforementioned microtask and then calling focus() later. - Calling
focus("no-focus-change")
leads to focus being retained by the capturing application. Callingfocus("focus-captured-surface")
immediately switches focus to the captured tab/window. - Not calling
focus()
at all, or calling it too late, leaves the decision in the hands of the user agent.
Suggested-spec
See spec-draft for the full description of the suggested solution.
Demo
- This solution is implemented in Chrome starting with m95. It is gated by
--enable-blink-features=ConditionalFocus
. (Or enableExperimental Web Platform features
on chrome://flags.) - A demo is available. It works with Chrome m95 and up.
Sample Code
const stream = await navigator.mediaDevices.getDisplayMedia();
track.focus(ShouldFocus(stream) ? "focus-captured-surface" : "no-focus-change")
function ShouldFocus(stream) {
const [track] = stream.getVideoTracks();
if (sampleUsesCaptureHandle) {
// Assume logic discriminating focusability by origin,
// for instance focusing anything except https://collaborator.com.
const captureHandle = track.getCaptureHandle();
return ShouldFocusOrigin(captureHandle && captureHandle.origin);
} else { // Assume Capture Handle is not a thing.
// Assume the application is only interested in focusing tabs, not windows.
return track.getSettings().displaySurface == 'browser';
}
}
Security Concerns
One noteworthy security concerns is that allowing switching focus at an arbitrary moment could allow clickjacking attacks. The suggested spec addresses this concern by limiting the time when focus-switching may be triggered/suppressed.
Alternate Approaches
Alternate Approach 1: Extra Parameter to getDisplayMedia()
This would allow an application to always/never switch focus, but it would not allow the application to make a divergent decision based on what display-surface was selected by the user. The sample use case presented earlier showcases why that would not be a desirable limitation.
Alternate Approach 2: Focus Hand-Off (Capturer->Captured)
Allowing capturer->captured handoff of focus was considered and pitched to the WebRTC WG. This option is a bit scarier from a security perspective, as the capturing application could try to clickjack the user into pressing something problematic on the captured application at an inopportune moment. The risk seems limited, but it's still greater than with the current proposal.
Let's Discuss
Discussion welcome either here or on the relevant thread in the WebRTC WG.