Conditional Focus (When Display-Capture Starts)

# Conditional Focus

## Problem
When an application starts capturing a display-surface, the user agent faces a decision - should the captured display-surface be brought to the forefront, or should the capturing application retain focus.

The user agent is mostly agnostic of the nature of the capturing and captured applications, and is therefore ill-positioned to make an informed decision.

In contrast, the capturing application is familiar with its own properties, and is therefore better suited to make this decision. Moreover, by reading [displaySurface](https://www.w3.org/TR/screen-capture/#dom-mediatrackconstraintset-displaysurface) and/or using [Capture Handle](https://wicg.github.io/capture-handle/), the capturing application can learn about the captured display-surface, driving an even more informed decision.

## Sample Use Case
For **example**, a video conferencing application **may** wish to:
* Focus a captured application that users typically interact with during the call, like a text editor.
* Retain for itself focus when the captured display-surface is non-interactive content, like a video.
  * (Using [Capture Handle](https://wicg.github.io/capture-handle/), the capturing application may even allow the user to remotely start/pause the video.)

## Proposed Solution
* Recall that [getDisplayMedia()](https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getDisplayMedia) returns a [Promise](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise)<[MediaStream](https://developer.mozilla.org/en-US/docs/Web/API/MediaStream/MediaStream)>, and that said [MediaStream](https://developer.mozilla.org/en-US/docs/Web/API/MediaStream/MediaStream) is guaranteed to contain at least one video track.
* When [getDisplayMedia()](https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getDisplayMedia) is called and the user approves the capture of a tab or a window, the video track will be of type `FocusableMediaStreamTrack`, subclassing [MediaStreamTrack](https://developer.mozilla.org/en-US/docs/Web/API/MediaStreamTrack).
* `FocusableMediaStreamTrack` exposes a `focus()` method.
* This method may only be called on the microtask on which the aforementioned [Promise](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Promise) was resolved. Later invocations of `focus()` produce an error.
* Calls to `focus()` that occur more than 1s after capture started are silently ignored, preventing the application from performing a busy-wait on the aforementioned microtask and then calling focus() later.
* Calling `focus("no-focus-change")` leads to focus being retained by the capturing application. Calling `focus("focus-captured-surface")` immediately switches focus to the captured tab/window.
* Not calling `focus()` at all, or calling it too late, leaves the decision in the hands of the user agent.

## Suggested-spec
See [spec-draft](https://eladalon1983.github.io/conditional-focus/index.html) for the full description of the suggested solution.

## Demo
* This solution is implemented in Chrome starting with m95. It is gated by `--enable-blink-features=ConditionalFocus`. (Or enable `Experimental Web Platform features` on chrome://flags.)
* A [demo](https://eladalon1983.github.io/conditional-focus/demo/) is available. It works with Chrome m95 and up.

## Sample Code
```
const stream = await navigator.mediaDevices.getDisplayMedia();
track.focus(ShouldFocus(stream) ? "focus-captured-surface" : "no-focus-change")

function ShouldFocus(stream) {
  const [track] = stream.getVideoTracks();
  if (sampleUsesCaptureHandle) {
    // Assume logic discriminating focusability by origin,
    // for instance focusing anything except https://collaborator.com.
    const captureHandle = track.getCaptureHandle();
    return ShouldFocusOrigin(captureHandle && captureHandle.origin);
  } else {  // Assume Capture Handle is not a thing.
    // Assume the application is only interested in focusing tabs, not windows.
    return track.getSettings().displaySurface == 'browser';
  }
}
```

## Security Concerns
One noteworthy security concerns is that allowing switching focus at an arbitrary moment could allow clickjacking attacks. The [suggested spec](https://eladalon1983.github.io/conditional-focus/index.html) addresses this concern by limiting the time when focus-switching may be triggered/suppressed.

## Alternate Approaches
### Alternate Approach 1: Extra Parameter to getDisplayMedia()
This would allow an application to always/never switch focus, but it would not allow the application to make a divergent decision based on what display-surface was selected by the user. The sample use case presented earlier showcases why that would not be a desirable limitation.

### Alternate Approach 2: Focus Hand-Off (Capturer->Captured)
Allowing capturer->captured handoff of focus was considered and [pitched to the WebRTC WG](https://github.com/w3c/mediacapture-screen-share/issues/165). This option is a bit scarier from a security perspective, as the capturing application could try to clickjack the user into pressing something problematic on the captured application at an inopportune moment. The risk seems limited, but it's still greater than with the current proposal.

## Let's Discuss
Discussion welcome either here or on the [relevant thread](https://github.com/w3c/mediacapture-screen-share/issues/190) in the WebRTC WG.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Conditional Focus (When Display-Capture Starts) #37

Conditional Focus

Problem

Sample Use Case

Proposed Solution

Suggested-spec

Demo

Sample Code

Security Concerns

Alternate Approaches

Alternate Approach 1: Extra Parameter to getDisplayMedia()

Alternate Approach 2: Focus Hand-Off (Capturer->Captured)

Let's Discuss

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Conditional Focus (When Display-Capture Starts) #37

Description

Conditional Focus

Problem

Sample Use Case

Proposed Solution

Suggested-spec

Demo

Sample Code

Security Concerns

Alternate Approaches

Alternate Approach 1: Extra Parameter to getDisplayMedia()

Alternate Approach 2: Focus Hand-Off (Capturer->Captured)

Let's Discuss

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions