Media Session Standard

Editor’s Draft,

This version:
https://wicg.github.io/mediasession
Editors:
(Google Inc.)
(Google Inc.)
Participate:
File an issue (open issues)
Version History:
https://github.com/WICG/mediasession/commits

Abstract

This specification enables web developers to show customize media metadata shown on platform UI, customize available platform media controls, and access platform media keys such as hardware keys found on keyboards, headsets, remote controls, and software keys found in notification areas and on lock screens of mobile devices.

Status of this document

This specification was published by the Web Platform Incubator Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

1. Introduction

This section is non-normative.

Media is used extensively today, and the Web is one of the primary means of consuming media content. Many platforms can display media metadata, such as title, artist, album and album art on various UI elements such as notification, media control center, device lockscreen and wearable devices. This specification aims to enable web pages to specify the media metadata to be displayed in platform UI, and respond to media controls which may come from platform UI or media keys, therefore improves the user experience.

2. Conformance

All diagrams, examples, and notes in this specification are non-normative, as are all sections explicitly marked non-normative. Everything else in this specification is normative.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. For readability, these words do not appear in all uppercase letters in this specification. [RFC2119]

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and terminate these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant.)

User agents may impose implementation-specific limits on otherwise unconstrained inputs, e.g. to prevent denial of service attacks, to guard against running out of memory, or to work around platform-specific limitations.

When a method or an attribute is said to call another method or attribute, the user agent must invoke its internal API for that attribute or method so that e.g. the author can’t change the behavior by overriding attributes or methods with custom properties or functions in JavaScript.

Unless otherwise stated, string comparisons are done in a case-sensitive manner.

3. Dependencies

The IDL fragments in this specification must be interpreted as required for conforming IDL fragments, as described in the Web IDL specification. [WEBIDL]

4. The MediaSession interface

[Exposed=(Window)]
partial interface Navigator {
  readonly attribute MediaSession mediaSession;
};

[Exposed=Window]
interface MediaSession : EventHandler {
  attribute MediaMetadata? metadata;

  attribute EventHandler onplay;
  attribute EventHandler onpause;
  attribute EventHandler onplaypause;
  attribute EventHandler onprevioustrack;
  attribute EventHandler onnexttrack;
  attribute EventHandler onseekbackward;
  attribute EventHandler onseekforward;
};

The mediaSession attribute is to retrive an instance of the MediaSession interface. The attribute MUST return the MediaSession instance associated with the Navigator object.

MediaSession objects are simply known as media sessions.

A media session has metadata, which is either a MediaMetadata object or null.

session . metadata
Returns the media session’s MediaMetadata object, if any, or null otherwise.

Can be set, to a MediaMetadata object or null.

The metadata attribute, on getting, MUST return the media session’s metadata.

There are a batch of event handlers, which are used for media controls. These event handlers are defined in §7 Media Controls.

5. The MediaMetadata interface

[Constructor(optional MediaMetadataInit init)]
interface MediaMetadata {
  readonly attribute DOMString title;
  readonly attribute DOMString artist;
  readonly attribute DOMString album;
  [SameObject] readonly attribute FrozenArray<MediaImage> artwork;
};

dictionary MediaMetadataInit {
  DOMString title = "";
  DOMString artist = "";
  DOMString album = "";
  sequence<MediaImageInit> artwork = [];
};

A MediaMetadata object has a title, an artist, an album and a FrozenArray of artwork images.

The MediaMetadata(init) constructor, when invoked, MUST run the following steps:

  1. Let metadata be a new MediaMetadata object.
  2. Set metadata’s title to init’s title.
  3. Set metadata’s artist to init’s artist.
  4. Set metadata’s album to init’s album.
  5. Set metadata’s artwork using init’s artwork by calling the MediaImage(init) constructor.
  6. Return metadata.

The title attribute MUST return the MediaMetadata objects’s title.

The artist attribute MUST return the MediaMetadata objects’s artist.

The album attribute MUST return the MediaMetadata objects’s album.

The artwork attribute MUST return the MediaMetadata objects’s artwork images, as a FrozenArray of MediaImages. The artwork attribute can be empty.

6. The MediaImage interface

[Constructor(optional MediaImageInit init)]
interface MediaImage {
  readonly attribute USVString src;
  readonly attribute DOMString sizes;
  readonly attribute DOMString type;
};

dictionary MediaImageInit {
  USVString src = "";
  DOMString sizes = "";
  DOMString type = "";
};

A MediaImage object has a source, a list of sizes, and a type.

The MediaImage(init) constructor, when invoked, MUST run the following steps:

  1. Let metadata be a new MediaImage object.
  2. Set metadata’s src to init’s src. If the URL is a relative URL, it MUST be resolved to an absolute URL using the document base URL.
  3. Set metadata’s sizes to init’s sizes.
  4. Set metadata’s type to init’s type.
  5. Return metadata.

The MediaImage src, sizes and type inspired from the image objects in Web App Manifest.

The src attribute MUST return the MediaImage object’s source. It is a URL from which the user agent can fetch the image’s data.

The sizes attribute MUST return the MediaImage object’s sizes. It follows the spec of sizes attribute in HTML link element, which is a string consisting of an unordered set of unique space-separated tokens which are ASCII case-insensitive that represents the dimensions of an image. Each keyword is either an ASCII case-insensitive match for the string "any", or a value that consists of two valid non-negative integers that do not have a leading U+0030 DIGIT ZERO (0) character and that are separated by a single U+0078 LATIN SMALL LETTER X or U+0058 LATIN CAPITAL LETTER X character. The keywords represent icon sizes in raw pixels (as opposed to CSS pixels). When multiple image objects are available, a user agent MAY use the value to decide which icon is most suitable for a display context (and ignore any that are inappropriate). The parsing steps for the sizes attribute MUST follow the parsing steps for HTML link element sizes attribute.

The type attribute MUST return the MediaImage object’s type. It is a hint as to the media type of the image. The purpose of this attribute is to allow a user agent to ignore images of media types it does not support.

7. Media Controls

In the MediaSession interface, there are a bunch of event handlers which are used for media controls. They enable the page to listen to media control actions from hardware or software media interfaces on the platform. A media control action is a user command to perform media-related actions, such as play/pause media playback and switching media tracks. Media control actions MAY come from various hardware and software interfaces on the platform, including media keys or buttons on the keyboard, headphone, remote controller, notification or lock screen.

For each media control action, there is a corresponding event handler called a media control event handler.

The media control event handler allows a page to register callbacks when the action is fired. On set, the page declares it supports the corresponding media control action, and the user agent could display a button for the action on platform UI and/or register listeners for the action to the platform, and forward the corresponding media control action to the media control event handler when the it is received.

The list of media control actions is as follows:

event handler event type Interface Event handler Fired when...
play Event onplay The user presses the "play" media key or button on any hardware or software media interface.
pause Event onpause The user presses the "pause" media key or button on any hardware or software media interface.
playpause Event onplaypause The user presses the "play/pause" media key or button on any hardware or software media interface.
previoustrack Event onprevioustrack The user presses the "previous track" media key or button on any hardware or software media interface.
nexttrack Event onnexttrack The user presses the "next track" media key or button on any hardware or software media interface.
seekbackward Event onseekbackward The user presses the "seek backward" media key or button on any hardware or software media interface.
seekforward Event onseekforward The user presses the "seek forward" media key or button on any hardware or software media interface.

It is still an open question whether to allow pages to handle play, pause and playpause actions. The user agent MAY either intercept these actions and handle them within the user agent, or forward these actions to the page and let the page handle them. See issue on GitHub.

8. Processing model

As there is a Window object per browsing context, the top-level browsing context and each nested browsing context will have an associated MediaSession object. For each tab, the user agent SHOULD select a MediaSession object that best represents the tab. The selected MediaSession object is called tab-level active media session. The selection of the tab-level active media session is up to the user agent and SHOULD base on preferred user experience.

It is still an open question how to select a MediaSession object as the tab-level active media session. Making the MediaSession object in a nested browsing context as the tab-level active media session can be either good or bad in different use cases. See the issue on GitHub.

When the user agent has multiple tabs, the user agent MUST select the most meaningful audio-producing tab, which is producing the most meaningful audio to the user. The user agent SHOULD select the most meaningful audio-producing tab based on platform conventions and the preferred user experience. The most meaningful audio-producing tab can be null. The tab-level active media session of the most meaningful audio-producing tab is called the most meaningful media session.

The user agent then MUST always route the most meaningful media session to the platform, which means:

  1. If possible, the user agent SHOULD present the metadata of the most meaningful media session to the platform for display purpose. This MUST not be done for all other media sessions.
  2. If possible, the user agent SHOULD register listeners to the platform for the most meaningful media session, display corresponding UI buttons if needed, and forward all media control actions to the most meaningful media session. This MUST not be done for all other media sessions.

The media metadata for the most meaningful audio-producing tab MAY be displayed in the platform UI depending on platform conventions. Whenever the most meaningful media session changes or setting metadata of the most meaningful media session, the user agent MUST run the update metadata algorithm. The steps are as follows:

  1. If the most meaningful audio-producing tab is null, unset the media metadata presented to the platform, and terminate these steps.
  2. If the metadata of the most meaningful media session is null, unset the media metadata presented to the platform, and terminate these steps.
  3. Update the media metadata presented to the platform to match the metadata for the tab-level active media session of the most meaningful audio-producing tab.
  4. If the user agent wants to display artwork image, it is RECOMMENDED to run the fetch image algorithm.

The RECOMMENDED fetch image algorithm is as follows:

  1. If there are other fetch image algorithm running, cancel existing algorithm execution instances.
  2. If metadata’s artwork of the tab-level active media session of the most meaningful audio-producing tab is empty, then terminate these steps.
  3. If the platform supports displaying media artwork, select a preferred artwork image from metadata’s artwork of the tab-level active media session of the most meaningful audio-producing tab.
  4. Fetch the preferred artwork image’s src.

    Then, in parallel:

    1. Wait for the response.
    2. If the response’s internal response’s type is default, attempt to decode the resource as image.
    3. If the image format is supported, use the image as the artwork for display in platform UI. Otherwise the fetch image algorithm fail and terminate.

If no artwork images are fetched in the fetch image algorithm, the user agent MAY have fallback behavior such as displaying an default image as artwork.

Whenever the most meaningful media session changes or the media control event handlers of the most meaningful media session change, the user agent MUST do the following:

  1. If the most meaningful media session is null, unregister all media control listeners registered to the platform and remove the related UI buttons for the media control event handlers if displayed.
  2. Update the listeners registered to the platform to match the media control event handlers registered to the most meaningful media session.
  3. If needed, update the UI buttons displayed on the platform UI to match the media control event handlers.

Please note that a page registering a media control event handlers only shows it supports handling the corresponding media control action. It does not guarantee that the user agent will display a button in the UI or register listeners to the platform for the media control action. The user agent MAY select a subset of the registered media control event handlers to display UI buttons and/or register event listeners to the platform, based on platform capability and conventions, or based on UI concerns.

The user agent MAY have some fallback steps to handle some media control actions, such as handling play in the user agent instead of in the page.

When the user agent receives a media control action from the platform or the UI, it MUST run the following steps:

  1. Create an Event object for the media control action.
  2. Call the corresponding media control event handler of the most meaningful media session.
  3. If the user agent has fallback steps to handle the media control action, check the defaultPrevented attribute of the Event. If it is false, run the fallback steps.

It is still an open question how to opt in/out of the fallback behavior. Checking defaultPrevented only works after the media control event handler is called, thus the page could have already done some steps. This might cause the media control action be handled both by the page and the user agent, which will produce wrong behavior for events such as nexttrack or seekforward. See issue on GitHub.

9. Examples

This section is non-normative.

Setting metadata:
window.navigator.mediaSession.metadata = new MediaMetadata({
  title: "Episode Title",
  artist: "Podcast Host",
  album: "Podcast Title",
  artwork: [{src: "podcast.jpg"}]
});

Alternatively, providing multiple artwork images in the metadata can let the user agent be able to select different artwork images for different display purposes and better fit for different screens:

window.navigator.mediaSession.metadata = new MediaMetadata({
  title: "Episode Title",
  artist: "Podcast Host",
  album: "Podcast Title",
  artwork: [
    {src: "podcast.jpg", sizes: "128x128", type: "image/jpeg"},
    {src: "podcast_hd.jpg", sizes: "256x256"},
    {src: "podcast_xhd.jpg", sizes: "1024x1024", type: "image/jpeg"},
    {src: "podcast.png", sizes: "128x128", type: "image/png"},
    {src: "podcast_hd.png", sizes: "256x256", type: "image/png"},
    {src: "podcast.ico", sizes: "128x128 256x256", type: "image/x-icon"}
  ]
});

For example, if the user agent wants to use an image as icon, it may choose "podcast.jpg" or "podcast.png" for a low-pixel-density screen, and "podcast_hd.jpg" or "podcast_hd.png" for a high-pixel-density screen. If the user agent want to use an image for lockscreen background, "podcast_xhd.jpg" will be preferred.

Changing metadata:

For playlists or chapters of an audio book, multiple media elements can share a single media session.

var audio1 = document.createElement("audio");
audio1.src = "chapter1.mp3";

var audio2 = document.createElement("audio");
audio2.src = "chapter2.mp3";

audio1.play();
audio1.addEventListener("ended", function() {
  audio2.play();
});

Because the session is shared, the metadata must be updated to reflect what is currently playing.

function updateMetadata(event) {
  window.navigator.mediaSession.metadata = new MediaMetadata({
    title: event.target == audio1 ? "Chapter 1" : "Chapter 2",
    artist: "An Author",
    album: "A Book",
    artwork: [{src: "cover.jpg"}]
  });
}

audio1.addEventListener("play", updateMetadata);
audio2.addEventListener("play", updateMetadata);
Handling media control actions:
var tracks = ["chapter1.mp3", "chapter2.mp3", "chapter3.mp3"];
var trackId = 0;

var audio = document.createElement("audio");
audio.src = tracks[trackId];

void updatePlayingMedia() {
    audio.src = tracks[trackId];
    // Update metadata (omitted)
}

window.navigator.mediaSession.onprevioustrack = function() {
   trackId = (trackId + tracks.length - 1) % tracks.length;
   updatePlayingMedia();
}

window.navigator.mediaSession.onnexttrack = function() {
   trackId = (trackId + 1) % tracks.length;
   updatePlayingMedia();
}

Acknowledgments

The editor would like to thank Paul Adenot, Jake Archibald, Tab Atkins, Jonathan Bailey, Marcos Caceres, Domenic Denicola, Ralph Giles, Anne van Kesteren, Tobie Langel, Michael Mahemoff, Jer Noble, Elliott Sprehn, Chris Wilson, and Jörn Zaefferer for their participation in technical discussions that ultimately made this specification possible.

Special thanks go to Philip Jägenstedt and David Vest for their help in designing every aspect of media sessions and for their seemingly infinite patience in working through the initial design issues; Jer Noble for his help in building a model that also works well within the iOS audio focus model; and Mounir Lamouri and Anton Vayvod for their early involvement, feedback and support in making this specification happen.

This standard is written by Rich Tibbett (Opera, richt@opera.com).

Per CC0, to the extent possible under law, the editors have waived all copyright and related or neighboring rights to this work.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[APPMANIFEST]
Marcos Caceres; et al. Web App Manifest. URL: https://w3c.github.io/manifest/
[FETCH]
Anne van Kesteren. Fetch Standard. Living Standard. URL: https://fetch.spec.whatwg.org/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[WEBIDL]
Cameron McCormack; Boris Zbarsky; Tobie Langel. Web IDL. URL: https://heycam.github.io/webidl/
[WHATWG-DOM]
Anne van Kesteren. DOM Standard. Living Standard. URL: https://dom.spec.whatwg.org/
[WHATWG-URL]
Anne van Kesteren. URL Standard. Living Standard. URL: https://url.spec.whatwg.org/

IDL Index

[Exposed=(Window)]
partial interface Navigator {
  readonly attribute MediaSession mediaSession;
};

[Exposed=Window]
interface MediaSession : EventHandler {
  attribute MediaMetadata? metadata;

  attribute EventHandler onplay;
  attribute EventHandler onpause;
  attribute EventHandler onplaypause;
  attribute EventHandler onprevioustrack;
  attribute EventHandler onnexttrack;
  attribute EventHandler onseekbackward;
  attribute EventHandler onseekforward;
};

[Constructor(optional MediaMetadataInit init)]
interface MediaMetadata {
  readonly attribute DOMString title;
  readonly attribute DOMString artist;
  readonly attribute DOMString album;
  [SameObject] readonly attribute FrozenArray<MediaImage> artwork;
};

dictionary MediaMetadataInit {
  DOMString title = "";
  DOMString artist = "";
  DOMString album = "";
  sequence<MediaImageInit> artwork = [];
};

[Constructor(optional MediaImageInit init)]
interface MediaImage {
  readonly attribute USVString src;
  readonly attribute DOMString sizes;
  readonly attribute DOMString type;
};

dictionary MediaImageInit {
  USVString src = "";
  DOMString sizes = "";
  DOMString type = "";
};

Issues Index

It is still an open question whether to allow pages to handle play, pause and playpause actions. The user agent MAY either intercept these actions and handle them within the user agent, or forward these actions to the page and let the page handle them. See issue on GitHub.
It is still an open question how to select a MediaSession object as the tab-level active media session. Making the MediaSession object in a nested browsing context as the tab-level active media session can be either good or bad in different use cases. See the issue on GitHub.
It is still an open question how to opt in/out of the fallback behavior. Checking defaultPrevented only works after the media control event handler is called, thus the page could have already done some steps. This might cause the media control action be handled both by the page and the user agent, which will produce wrong behavior for events such as nexttrack or seekforward. See issue on GitHub.