这是indexloc提供的服务,不要输入任何密码
Skip to content

Conditional Focus (When Display-Capture Starts) #37

@eladalon1983

Description

@eladalon1983

Conditional Focus

Problem

When an application starts capturing a display-surface, the user agent faces a decision - should the captured display-surface be brought to the forefront, or should the capturing application retain focus.

The user agent is mostly agnostic of the nature of the capturing and captured applications, and is therefore ill-positioned to make an informed decision.

In contrast, the capturing application is familiar with its own properties, and is therefore better suited to make this decision. Moreover, by reading displaySurface and/or using Capture Handle, the capturing application can learn about the captured display-surface, driving an even more informed decision.

Sample Use Case

For example, a video conferencing application may wish to:

  • Focus a captured application that users typically interact with during the call, like a text editor.
  • Retain for itself focus when the captured display-surface is non-interactive content, like a video.
    • (Using Capture Handle, the capturing application may even allow the user to remotely start/pause the video.)

Proposed Solution

  • Recall that getDisplayMedia() returns a Promise<MediaStream>, and that said MediaStream is guaranteed to contain at least one video track.
  • When getDisplayMedia() is called and the user approves the capture of a tab or a window, the video track will be of type FocusableMediaStreamTrack, subclassing MediaStreamTrack.
  • FocusableMediaStreamTrack exposes a focus() method.
  • This method may only be called on the microtask on which the aforementioned Promise was resolved. Later invocations of focus() produce an error.
  • Calls to focus() that occur more than 1s after capture started are silently ignored, preventing the application from performing a busy-wait on the aforementioned microtask and then calling focus() later.
  • Calling focus("no-focus-change") leads to focus being retained by the capturing application. Calling focus("focus-captured-surface") immediately switches focus to the captured tab/window.
  • Not calling focus() at all, or calling it too late, leaves the decision in the hands of the user agent.

Suggested-spec

See spec-draft for the full description of the suggested solution.

Demo

  • This solution is implemented in Chrome starting with m95. It is gated by --enable-blink-features=ConditionalFocus. (Or enable Experimental Web Platform features on chrome://flags.)
  • A demo is available. It works with Chrome m95 and up.

Sample Code

const stream = await navigator.mediaDevices.getDisplayMedia();
track.focus(ShouldFocus(stream) ? "focus-captured-surface" : "no-focus-change")

function ShouldFocus(stream) {
  const [track] = stream.getVideoTracks();
  if (sampleUsesCaptureHandle) {
    // Assume logic discriminating focusability by origin,
    // for instance focusing anything except https://collaborator.com.
    const captureHandle = track.getCaptureHandle();
    return ShouldFocusOrigin(captureHandle && captureHandle.origin);
  } else {  // Assume Capture Handle is not a thing.
    // Assume the application is only interested in focusing tabs, not windows.
    return track.getSettings().displaySurface == 'browser';
  }
}

Security Concerns

One noteworthy security concerns is that allowing switching focus at an arbitrary moment could allow clickjacking attacks. The suggested spec addresses this concern by limiting the time when focus-switching may be triggered/suppressed.

Alternate Approaches

Alternate Approach 1: Extra Parameter to getDisplayMedia()

This would allow an application to always/never switch focus, but it would not allow the application to make a divergent decision based on what display-surface was selected by the user. The sample use case presented earlier showcases why that would not be a desirable limitation.

Alternate Approach 2: Focus Hand-Off (Capturer->Captured)

Allowing capturer->captured handoff of focus was considered and pitched to the WebRTC WG. This option is a bit scarier from a security perspective, as the capturing application could try to clickjack the user into pressing something problematic on the captured application at an inopportune moment. The risk seems limited, but it's still greater than with the current proposal.

Let's Discuss

Discussion welcome either here or on the relevant thread in the WebRTC WG.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions