Document Picture-in-Picture Specification

Draft Community Group Report,

This version:
https://wicg.github.io/document-picture-in-picture/
Issue Tracking:
GitHub
Inline In Spec
Editor:
(Google Inc.)

Abstract

This specification enables web developers to populate an HTMLDocument in an always-on-top window.

Status of this document

This specification was published by the Web Platform Incubator Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

1. Introduction

This section is non-normative.

There currently exists a Web API for putting an HTMLVideoElement into a Picture-in-Picture window (requestPictureInPicture()). This limits a website’s ability to provide a custom picture-in-picture experience (PiP). We want to expand upon that functionality by providing the website with a full Document on an always-on-top window.

This new window will be much like a blank same-origin window opened via the existing open() method on Window, with some minor differences:

2. Dependencies

The IDL fragments in this specification must be interpreted as required for conforming IDL fragments, as described in the Web IDL specification. [WEBIDL]

3. Security Considerations

3.1. Secure Context

The API is limited to [SECURE-CONTEXTS].

3.2. Spoofing

It is required that the user agent provides enough UI on the DocumentPictureInPicture window to prevent malicious websites from abusing the ability to float on top of other windows to spoof other websites or system UI.

3.2.1. Positioning

The user agent must prevent the website from setting the position of the window in order to prevent the website from purposefully positioning the window in a location that may trick a user into thinking it is part of another page’s UI. In particular, this means the moveTo() and moveBy() APIs must be disabled for document picture-in-picture windows.

3.2.2. Origin Visibility

It is required that the user agent makes it clear to the user which origin is controlling the DocumentPictureInPicture window at all times to ensure that the user is aware of where the content is coming from. For example, the user agent may display the origin of the website in a titlebar on the window.

3.2.3. Maximum size

The user agent should restrict the maximum size of the document picture-in-picture window to prevent the website from covering the screen with an always-on-top window and locking the user in the picture-in-picture window. This also helps prevent spoofing the user’s desktop.

3.3. IFrames

This API is only available on a top-level traversable. However, the DocumentPictureInPicture Window itself may contain HTMLIFrameElements, even cross-origin HTMLIFrameElements.

4. Privacy Considerations

4.1. Fingerprinting

When a PiP window is closed and then later re-opened, it can be useful for the user agent to re-use size and location of the previous PiP window to provide a smoother user experience. However, it is recommended that the user agent does not re-use size/location across different origins as this may provide malicious websites an avenue for fingerprinting a user.

5. API

[Exposed=Window]
partial interface Window {
  [SameObject, SecureContext] readonly attribute DocumentPictureInPicture
    documentPictureInPicture;
};

[Exposed=Window, SecureContext]
interface DocumentPictureInPicture : EventTarget {
  [NewObject] Promise<Window> requestWindow(
    optional DocumentPictureInPictureOptions options = {});
  readonly attribute Window window;
  attribute EventHandler onenter;
};

dictionary DocumentPictureInPictureOptions {
  [EnforceRange] unsigned long long width = 0;
  [EnforceRange] unsigned long long height = 0;
  boolean disallowReturnToOpener = false;
  boolean preferInitialWindowPlacement = false;
};

[Exposed=Window, SecureContext]
interface DocumentPictureInPictureEvent : Event {
  constructor(DOMString type, DocumentPictureInPictureEventInit eventInitDict);
  [SameObject] readonly attribute Window window;
};

dictionary DocumentPictureInPictureEventInit : EventInit {
  required Window window;
};

A DocumentPictureInPicture object allows websites to create and open a new always-on-top Window as well as listen for events related to opening and closing that Window.

Each Window object has an associated documentPictureInPicture API, which is a new DocumentPictureInPicture instance created alongside the Window.

The documentPictureInPicture getter steps are:
  1. Return this’s documentPictureInPicture API.

Each DocumentPictureInPicture object has an associated last-opened window which is a Window object that is initially null and is set as part of the requestWindow() method steps.

The window getter steps are:
  1. Let win be this’s last-opened window.

  2. If win is not null and win’s closed attribute is false, return win.

  3. Return null.

The requestWindow(options) method steps are:
  1. If Document Picture-in-Picture support is false, throw a "NotSupportedError" DOMException.

  2. If this’s relevant global object’s navigable is not a top-level traversable, throw a "NotAllowedError" DOMException.

  3. If this’s relevant global object’s navigable’s Is Document Picture-in-Picture boolean is true, throw a "NotAllowedError" DOMException.

  4. If this’s relevant global object does not have transient activation, throw a "NotAllowedError" DOMException.

  5. If options["width"] exists and is greater than zero, but options["height"] does not exist or is zero, throw a RangeError.

  6. If options["height"] exists and is greater than zero, but options["width"] does not exist or is zero, throw a RangeError.

  7. Consume user activation given this’s relevant global object.

  8. Let win be this’s last-opened window. If win is not null and win’s closed attribute is false, then close win’s navigable.

  9. Optionally, the user agent can close any existing picture-in-picture windows.

  10. Set pip traversable to be the result of creating a new top-level traversable given this’s relevant global object’s navigable’s active browsing context and "_blank".

The resulting Document's URL will be `about:blank`, but its document base URL will fall back to be that of the initiator that called requestWindow(). Some browsers do not implement this fallback behavior for normal `about:blank` popups; see whatwg/html#421 for discussion. Implementers are advised to make sure this inheritance happens as specified for document picture-in-picture windows, to avoid further interop problems.

  1. Set pip traversable’s active document’s mode to this’s relevant global object’s associated Document’s mode.

  2. Set pip traversable’s Is Document Picture-in-Picture boolean to true.

  3. If options["width"] exists and is greater than zero:

    1. Optionally, clamp or ignore options["width"] if it is too large or too small in order to fit a user-friendly window size.

    2. Optionally, size pip traversable’s active browsing context’s window such that the distance between the left and right edges of the viewport are options["width"] pixels.

  4. If options["height"] exists and is greater than zero:

    1. Optionally, clamp or ignore options["height"] if it is too large or too small in order to fit a user-friendly window size.

    2. Optionally, size pip traversable’s active browsing context’s window such that the distance between the top and bottom edges of the viewport are options["height"] pixels.

If options["preferInitialWindowPlacement"] exists and is true, then the user agent may use this hint to prefer behavior that is similar that is similar to steps 13 and 14, rather than considering any previous position or size of any previously closed pip traversable window.

  1. If options["disallowReturnToOpener"] exists and is true, the user agent should not display UI affordances on the picture-in-picture window that allow the user to return to the opener window.

For both video and document picture-in-picture, user agents often display a button for the user to return to the original page and close the picture-in-picture window. While this action makes sense in most cases (especially for a video picture-in-picture window that returns the video to the main document), it does not always make sense for document picture-in-picture windows. disallowReturnToOpener is a hint to the user agent from the website as to whether that action makes sense for their particular document picture-in-picture experience.

  1. Configure pip traversable’s active browsing context’s window to float on top of other windows.

  2. Set this’s last-opened window to pip traversable’s active window.

  3. Queue a global task on the DOM manipulation task source given this’s relevant global object to fire an event named enter using DocumentPictureInPictureEvent on this with its window attribute initialized to pip traversable’s active window.

  4. Return pip traversable’s active window.

While the size of the window can be configured by the website, the initial position is left to the discretion of the user agent.

enter

Fired on DocumentPictureInPicture when a PiP window is opened.

6. Concepts

6.1. Document Picture-in-Picture Support

Each user agent has a Document Picture-in-Picture Support boolean, whose value is implementation-defined (and might vary according to user preferences).

6.2. DocumentPictureInPicture Window

Each top-level traversable has an Is Document Picture-in-Picture boolean, whose value defaults to false, but can be set to true in the requestWindow() method steps.

6.3. Closing a Document Picture-in-Picture window

Merge this into close once it has enough consensus.

Modify step 2 of close, "If the result of checking if unloading is user-canceled for toUnload is true, then return." to be:

  1. If traversable’s Is Document Picture-in-Picture boolean is true, then skip this step. Otherwise, if the result of checking if unloading is user-canceled for toUnload is true, then return.

6.4. Close any existing PiP windows

To close any existing picture-in-picture windows:

  1. For each top-level traversable of the user agent’s top-level traversable set:

    1. If top-level traversable’s Is Document Picture-in-Picture boolean is true, then close top-level traversable.

    2. If top-level traversable’s active document’s pictureInPictureElement is not null, run the exit Picture-in-Picture algorithm with top-level traversable’s active document.

    3. For each navigable of top-level traversable’s active document’s descendant navigables:

      1. If navigable’s active document’s pictureInPictureElement is not null, run the exit Picture-in-Picture algorithm with navigable’s active document.

6.5. One PiP Window

Any top-level traversable must have at most one document picture-in-picture window open at a time. If a top-level traversable whose active window’s documentPictureInPicture API’s last-opened window is not null tries to open another document picture-in-picture window, the user agent must close the existing last-opened window as described in the requestWindow() method steps.

However, whether only one window is allowed in Picture-in-Picture mode across all top-level traversables is left to the implementation and the platform. As such, what happens when there is a Picture-in-Picture request while there is a top-level traversable whose Is Document Picture-in-Picture boolean is true or whose active document’s pictureInPictureElement is not null will be left as an implementation detail: the user agent could close any existing picture-in-picture windows or multiple Picture-in-Picture windows could be created.

6.6. Closing the PiP window when either the original or PiP document is destroyed

To close any associated Document Picture-in-Picture windows given a Document document:

  1. Let navigable be document’s node navigable.

  2. If navigable is not a top-level traversable, abort these steps.

  3. If navigable’s Is Document Picture-in-Picture boolean is true, then close navigable and abort these steps.

  4. Let win be navigable’s active window’s documentPictureInPicture API’s last-opened window.

  5. If win is not null and win’s closed attribute is false, then close win’s navigable.

Merge this into destroy once it has enough consensus.

Add a step 10 to the end of destroy:

  1. Close any associated Document Picture-in-Picture windows given document.

This ensures that when a page with an open Document Picture-in-Picture window is closed, then its PiP window is closed as well.

6.7. Closing the PiP window when either the original or PiP document is navigated

Merge this into navigate once it has enough consensus.

Modify step 16.3 of navigate, "Queue a global task on the navigation and traversal task source given navigable’s active window to abort navigable’s active document.", and also insert a step 16.4 immediately after it:

  1. Queue a global task on the navigation and traversal task source given navigable’s active window to abort navigable’s active document and close any associated Document Picture-in-Picture windows given navigable’s active document.

  2. If navigable is a top-level traversable whose Is Document Picture-in-Picture boolean is true, then abort these steps.

This ensures that when a page with an open Document Picture-in-Picture window is navigated, then its PiP window is closed as well. It also ensures that when the document in a Document Picture-in-Picture window is navigated, the Document Picture-in-Picture window is closed.

6.8. Resizing the PiP window

While programmatically resizing a document picture-in-picture window can be useful, the always-on-top nature of the window means an unrestricted ability to resize the window could be abused in annoying or intrusive way. To mitigate these concerns without completely preventing the use of window resize APIs, we will have those APIs consume a user gesture for document picture-in-picture windows.

Merge this into resizeTo() once it has enough consensus.

Add a new step to resizeTo() after step 3, "If target is not an auxiliary browsing context that was created by a script (as opposed to by an action of the user), then return.":

  1. If target’s top-level traversable’s Is Document Picture-in-Picture boolean is true, then:

    1. If this’s relevant global object does not have transient activation, throw a "NotAllowedError" DOMException.

    2. Consume user activation given this’s relevant global object.

Merge this into resizeBy() once it has enough consensus.

Add a new step to resizeBy() after step 3, "If target is not an auxiliary browsing context that was created by a script (as opposed to by an action of the user), then return.":

  1. If target’s top-level traversable’s Is Document Picture-in-Picture boolean is true, then:

    1. If this’s relevant global object does not have transient activation, throw a "NotAllowedError" DOMException.

    2. Consume user activation given this’s relevant global object.

6.9. Focusing the opener window

It can often be useful for the picture-in-picture window to be able to re-focus its opener tab, e.g. when the smaller form-factor of the window doesn’t fit the experience the user needs. We modify the focus() API to allow it to take system-level focus when a picture-in-picture window is focusing its opener.

Merge this into focus() once it has enough consensus.

Add a new step to focus() after step 3, "Run the focusing steps with current.":

  1. If current is a top-level traversable, then:

    1. Let pipWindow be current’s active window’s documentPictureInPicture API’s last-opened window.

    2. If pipWindow is not null and pipWindow’s relevant global object has transient activation, then:

      1. Consume user activation given pipWindow’s relevant global object.

      2. Give current system focus.

Giving system focus to the opener does not necessarily need to close the document picture-in-picture window. If the website wants to close the document picture-in-picture window after focusing, they can always do so using close() on the document picture-in-picture window itself.

6.10. CSS display-mode

The CSS display mode media feature picture-in-picture lets web developers write specific CSS rules that are only applied when (part of the) the web app is shown in picture-in-picture mode.

6.11. User activation propagation

Due to the nature of document picture-in-picture windows, event handlers on buttons within the window often end up actually running in the opener’s context. This can make it unergonomic for websites to call activation consuming APIs, since sometimes the document picture-in-picture window has transient activation while the opener does not.

To make this easier, we will update the activation notification steps to also trigger user activation in the opener when triggering user activation in a document picture-in-picture window. Additionally, when user activation is triggered in the opener, we will activate same-origin frames insides the document picture-in-picture window, similar to how same-origin descendant frames are activated.

Merge this into activation notification steps once it has enough consensus.

Add three new steps to activation notification after step 4, "Extend windows with the active window of each of document’s descendant navigables, filtered to include only those navigables whose active document’s origin is same origin with document’s origin":

  1. If document’s node navigable’s top-level traversable’s Is Document Picture-in-Picture boolean is true, then extend windows with document’s node navigable’s top-level traversable’s active browsing context’s opener browsing context’s active window.

  2. Let document picture-in-picture window be document’s node navigable’s top-level traversable’s active window’s documentPictureInPicture API’s last-opened window.

  3. If document picture-in-picture window is not null then extend windows with the active window of each of document picture-in-picture window’s associated document’s descendant navigables, filtered to include only those navigables whose active document’s origin is same origin with document picture-in-picture window’s associated document’s origin.

Additionally, we need to make sure that this activation is properly consumed so it can’t be used twice (once in the opener and once in the picture-in-picture window). We do this by adding steps to consume user activation which consume user activation from the opener when consuming a picture-in-picture window’s user activation, and consuming an associated picture-in-picture window’s user activation when consuming an opener’s user activation.

Merge this into consume user activation steps once it has enough consensus.

Add three new steps to consume user activation after step 3, "Let navigables be the inclusive descendant navigables of top’s active document.":

  1. If top’s Is Document Picture-in-Picture boolean is true, then extend navigables with the inclusive descendant navigables of top’s active browsing context’s opener browsing context’s active document.

  2. Let document picture-in-picture window be top’s active window’s documentPictureInPicture API’s last-opened window.

  3. If document picture-in-picture window is not null then extend navigables with the inclusive descendant navigables of document picture-in-picture window’s associated document.

7. Examples

This section is non-normative

7.1. Extracting a video player into PiP

7.1.1. HTML

<body>
  <div id="player-container">
    <div id="player">
      <video id="video" src="foo.webm"></video>
      <!-- More player elements here. -->
    </div>
  </div>
  <input type="button" onclick="enterPiP();" value="Enter PiP" />
</body>

7.1.2. JavaScript

// Handle to the picture-in-picture window.
let pipWindow = null;

function enterPiP() {
  const player = document.querySelector('#player');

  // Set the width/height so the window is properly sized to the video.
  const pipOptions = {
    width: player.clientWidth,
    height: player.clientHeight,
  };

  documentPictureInPicture.requestWindow(pipOptions).then((pipWin) => {
    pipWindow = pipWin;

    // Style remaining container to imply the player is in PiP.
    playerContainer.classList.add('pip-mode');

    // Add player to the PiP window.
    pipWindow.document.body.append(player);

    // Listen for the PiP closing event to put the video back.
    pipWindow.addEventListener('pagehide', onLeavePiP.bind(pipWindow), { once: true });
  });
}

// Called when the PiP window has closed.
function onLeavePiP() {
  if (this !== pipWindow) {
    return;
  }

  // Remove PiP styling from the container.
  const playerContainer = document.querySelector('#player-container');
  playerContainer.classList.remove('pip-mode');

  // Add the player back to the main window.
  const player = pipWindow.document.querySelector('#player');
  playerContainer.append(player);

  pipWindow = null;
}

7.2. Accessing elements on the PiP Window

const video = pipWindow.document.querySelector('#video');
video.loop = true;

7.3. Listening to events on the PiP Window

As part of creating an improved picture-in-picture experience, websites will often want customize buttons and controls that need to respond to user input events such as clicks.

const pipDocument = pipWindow.document;
const video = pipDocument.querySelector('#video');
const muteButton = pipDocument.document.createElement('button');
muteButton.textContent = 'Toggle mute';
muteButton.addEventListener('click', () => {
  video.muted = !video.muted;
});
pipDocument.body.append(muteButton);

7.4. Exiting PiP

The website may want to close the DocumentPictureInPicture Window without the user explicitly clicking on the window’s close button. They can do this by using the close() method on the Window object:

// This will close the PiP window and trigger our existing onLeavePiP()
// listener.
pipWindow.close();

7.5. Getting elements out of the PiP window when it closes

When the PiP window is closed for any reason (either because the website initiated it or the user closed it), the website will often want to get the elements back out of the PiP window. The website can perform this in an event handler for the pagehide event on the Window object. This is shown in the onLeavePiP() handler in video player example above and is copied below:

// Called when the PiP window has closed.
function onLeavePiP() {
  if (this !== pipWindow) {
    return;
  }

  // Remove PiP styling from the container.
  const playerContainer = document.querySelector('#player-container');
  playerContainer.classList.remove('pip-mode');

  // Add the player back to the main window.
  const player = pipWindow.document.querySelector('#player');
  playerContainer.append(player);

  pipWindow = null;
}

7.6. Programatically resize the PiP window

The document picture-in-picture window supports the resizeTo() and resizeBy() APIs, but only with a user gesture on the PiP window:

const expandButton = pipWindow.document.createElement('button');
expandButton.textContent = 'Expand PiP Window';
expandButton.addEventListener('click', () => {
  // Expand the PiP window’s width by 20px and height by 30px.
  pipWindow.resizeBy(20, 30);
});
pipWindow.document.body.append(expandButton);

7.7. Return to the opener tab

The focus() API can be used to focus the opener tab from a picture-in-picture window (requiring a user gesture):

const returnToTabButton = pipWindow.document.createElement('button');
returnToTabButton.textContent = 'Return to opener tab';
returnToTabButton.addEventListener('click', () => {
  window.focus();
});
pipWindow.document.body.append(returnToTabButton);

7.8. CSS picture-in-picture display mode usage

The following example shows how to remove margins on the body element and reduce the font size of titles in PiP window to better fit the content in question inside the PiP window:

@media all and (display-mode: picture-in-picture) {
  body {
    margin: 0;
  }
  h1 {
    font-size: 0.8em;
  }
}

7.9. Hide return-to-opener button

While user agents often display a button on their video and document picture-in-picture windows to return to the opener and close the window, this button doesn’t always make sense for some websites' document picture-in-picture experience. Use the disallowReturnToOpener option to hide the button.

await documentPictureInPicture.requestWindow({
  disallowReturnToOpener: true
});

7.10. Prefer initial window placement

While a document picture-in-picture window is open, the user may manually resize or reposition it. If the document picture-in-picture window is closed, then reopened later, the user agent may use the previous position and size as a hint for where to place the new window rather than opening it in is original, default position.

The site can provide a hint to the user agent that reusing the previous document picture-in-picture window position and size is not desirable by setting the preferInitialWindowPlacement value to true. For example, if the site is requesting the new document picture-in-picture window for an unrelated activity from the previous one, then the site might provide this hint to the user agent. In response, the user agent may choose to use the default position, the default size, or the size hint provided by the site instead.

await documentPictureInPicture.requestWindow({
  preferInitialWindowPlacement: true
});

8. Acknowledgments

Many thanks to Frank Liberato, Mark Foltz, Klaus Weidner, François Beaufort, Charlie Reis, Joe DeBlasio, Domenic Denicola, and Yiren Wang for their comments and contributions to this document and to the discussions that have informed it.

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[CSSOM-VIEW-1]
Simon Pieters. CSSOM View Module. URL: https://drafts.csswg.org/cssom-view/
[DOM]
Anne van Kesteren. DOM Standard. Living Standard. URL: https://dom.spec.whatwg.org/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[MEDIAQUERIES-5]
Dean Jackson; et al. Media Queries Level 5. URL: https://drafts.csswg.org/mediaqueries-5/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[WEBIDL]
Edgar Chen; Timothy Gu. Web IDL Standard. Living Standard. URL: https://webidl.spec.whatwg.org/

Informative References

[SECURE-CONTEXTS]
Mike West. Secure Contexts. URL: https://w3c.github.io/webappsec-secure-contexts/

IDL Index

[Exposed=Window]
partial interface Window {
  [SameObject, SecureContext] readonly attribute DocumentPictureInPicture
    documentPictureInPicture;
};

[Exposed=Window, SecureContext]
interface DocumentPictureInPicture : EventTarget {
  [NewObject] Promise<Window> requestWindow(
    optional DocumentPictureInPictureOptions options = {});
  readonly attribute Window window;
  attribute EventHandler onenter;
};

dictionary DocumentPictureInPictureOptions {
  [EnforceRange] unsigned long long width = 0;
  [EnforceRange] unsigned long long height = 0;
  boolean disallowReturnToOpener = false;
  boolean preferInitialWindowPlacement = false;
};

[Exposed=Window, SecureContext]
interface DocumentPictureInPictureEvent : Event {
  constructor(DOMString type, DocumentPictureInPictureEventInit eventInitDict);
  [SameObject] readonly attribute Window window;
};

dictionary DocumentPictureInPictureEventInit : EventInit {
  required Window window;
};

Issues Index

Merge this into close once it has enough consensus.
Merge this into destroy once it has enough consensus.
Merge this into navigate once it has enough consensus.
Merge this into resizeTo() once it has enough consensus.
Merge this into resizeBy() once it has enough consensus.
Merge this into focus() once it has enough consensus.
Merge this into activation notification steps once it has enough consensus.
Merge this into consume user activation steps once it has enough consensus.