Abstract

When an application captures a [=display-surface=], the user agent faces a decision - should the captured [=display-surface=] be brought to the forefront of the user's screen ("focused"), or should the capturing application retain focus. This document proposes a mechanism by which an application can influence this decision.

Definitions

This document uses the definition of the following concepts from [[SCREEN-CAPTURE]]: display-surface, application [=display-surface=], browser [=display-surface=], window [=display-surface=] and monitor [=display-surface=].

Problem Description

Assume a Web-application that calls {{MediaDevices/getDisplayMedia()}} and the user chooses to capture a tab or a window. It is not currently specified whether the user agent should focus the captured [=display-surface=], or let the capturing application retrain focus.

The user agent is mostly agnostic of the nature of the capturing and captured applications, and is therefore not well-positioned to make an informed decision with regards to focus.

In contrast, the capturing application is familiar with its own properties, and is better positioned to make this decision. Moreover, by reading {{MediaTrackConstraintSet/displaySurface}} and/or using Capture Handle, the capturing application can learn about the captured [=display-surface=], driving an even more informed decision.

For example, a video conferencing application may wish to:

The Conditional-Focus Mechanism

The conditional-focus mechanism allows the capturing application to instruct the user agent to either switch focus to the captured [=display-surface=], or to avoid such a focus change.

The window of opportunity for the application to make the decision is defined. If the mechanism is not invoked within this window of opportunity, the user agent takes over and makes its own decision.

getDisplayMedia() Extensions

{{MediaDevices/getDisplayMedia()}} is currently defined such that it returns a {{Promise}}<{{MediaStream}}>. We extend this definition such that when {{MediaDevices/getDisplayMedia()}} is called, if the user elects to capture either an [=application=], [=browser=] or [=window=] [=display-surface=], the video track of the aforementioned {{MediaStream}} will be of type {{FocusableMediaStreamTrack}}.

FocusableMediaStreamTrack

{{MediaStreamTrack}} is subclassed as {{FocusableMediaStreamTrack}}.

          [Exposed=Window]
          interface FocusableMediaStreamTrack : MediaStreamTrack {
            undefined focus(CaptureStartFocusBehavior focus_behavior);
          };

          enum CaptureStartFocusBehavior {
            "focus-captured-surface",
            "no-focus-change"
          };
        
focus()

Recall that the {{FocusableMediaStreamTrack}} object was instantiated in response to a call to {{MediaDevices/getDisplayMedia()}}. That call to {{MediaDevices/getDisplayMedia()}} returned a {{Promise}}<{{MediaStream}}> PRMS. Like any {{Promise}}, PRMS is settled on a microtask, which we will name MT.

When MT starts executing, a window of opportunity opens for the application to inform the user agent as to whether it wants the captured [=display-surface=] to be focused or not. Calls to {{focus()}} may only have an effect while this window of opportunity is open. It closes as soon as one of the following happens:

  • {{focus()}} is called for the first time.
  • MT finishes.
  • One second passes since the capture was started.

When the window of opportunity closes, if an explicit decision was not made through calling {{focus()}}, then the user agent MUST make its own decision.

Therefore, when {{focus()}} is called, the user agent MUST run the following steps:

  1. If this object is a clone, raise an {{InvalidStateError}}. Otherwise, proceed.
  2. If {{focus()}} was previously called on [=this=], raise an {{InvalidStateError}}. Otherwise, proceed.
  3. If this call to {{focus()}} is not on MT, the user agent MUST have already made a decision, so raise an {{InvalidStateError}}. Otherwise, proceed.
  4. If this call to {{focus()}} occurs more than one second after the start of the capture, the user agent MUST have already made a decision. The user agent MUST silently ignore this call {{focus()}}.
  5. This call to {{focus()}} occurred on MT and within one second of the capture starting. Therefore, the user agent MUST NOT make its own decision with respect to focusing the captured [=display-surface=], but rather:
    • If focus_behavior is set to {{CaptureStartFocusBehavior/"focus-captured-surface"}}, then the user agent MUST focus the captured [=display-surface=].
    • If focus_behavior is set to {{CaptureStartFocusBehavior/"no-focus-change"}}, then the user agent MUST NOT focus the captured [=display-surface=].

Usage Samples

All examples will assume a predicate named shouldFocus() which accepts a video {{MediaStreamTrack}} as input. It is a synchronous function returning either {{CaptureStartFocusBehavior/"no-focus-change"}} or {{CaptureStartFocusBehavior/"focus-captured-surface"}}.

            function shouldFocus(mediaStreamTrack) {
              // Synchronous.
              // Returns "no-focus-change" or "focus-captured-surface".
              // Has access to Capture Handle.
            }
        

Reasonable implementations of this predicate include:

Correct Usage Sample

            const mediaStream = await navigator.mediaDevices.getDisplayMedia();
            const [track] = mediaStream.getVideoTracks();
            if (!!track.focus) {
              track.focus(shouldFocus(track));  // Correct.
            }
          

Incorrect Usage Samples

              const mediaStream = await navigator.mediaDevices.getDisplayMedia();
              const [track] = mediaStream.getVideoTracks();
              await someOtherFunction();  // Mistake: Allows MT to finish its execution.
              if (!!track.focus) {
                track.focus(shouldFocus(track));
              }
          
              const mediaStream = await navigator.mediaDevices.getDisplayMedia();
              const [track] = mediaStream.getVideoTracks();
              setTimeout(() => {  // Mistake: Allows MT to finish its execution.
                if (!!track.focus) {
                  track.focus(shouldFocus(track));
                }
              }, 1);
          
              const mediaStream = await navigator.mediaDevices.getDisplayMedia();
              const [track] = mediaStream.getVideoTracks();
              timeConsumingFunc();  // Mistake: Might take longer than 1s.
              if (!!track.focus) {
                track.focus(shouldFocus(track));
              }