Media Session Standard

Editor’s Draft,

This version:
https://wicg.github.io/mediasession
Editors:
(Google Inc.)
(Google Inc.)
Participate:
File an issue (open issues)
Version History:
https://github.com/WICG/mediasession/commits

Abstract

This specification enables web developers to obtain different levels of platform media focus, customize available platform media controls, and access platform media keys such as hardware keys found on keyboards, headsets, remote controls, and software keys found in notification areas and on lock screens of mobile devices.

Status of this document

This specification was published by the Web Platform Incubator Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

1. Introduction

This section is non-normative.

Media is used extensively today, and the Web is one of the primary means of consuming media content. Many platforms can display media metadata, such as title, artist, album and album art on various UI elements such as notification, media control center, device lockscreen and wearable devices. This specification aims to enable web pages to specify the media metadata to be displayed in platform UI, which helps in improving the user experience.

2. Conformance

All diagrams, examples, and notes in this specification are non-normative, as are all sections explicitly marked non-normative. Everything else in this specification is normative.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119. For readability, these words do not appear in all uppercase letters in this specification. [RFC2119]

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and terminate these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant.)

User agents may impose implementation-specific limits on otherwise unconstrained inputs, e.g. to prevent denial of service attacks, to guard against running out of memory, or to work around platform-specific limitations.

When a method or an attribute is said to call another method or attribute, the user agent must invoke its internal API for that attribute or method so that e.g. the author can’t change the behavior by overriding attributes or methods with custom properties or functions in JavaScript.

Unless otherwise stated, string comparisons are done in a case-sensitive manner.

3. Dependencies

The IDL fragments in this specification must be interpreted as required for conforming IDL fragments, as described in the Web IDL specification. [WEBIDL]

4. The MediaSession interface

[Exposed=(Window)]
partial interface Navigator {
  readonly attribute MediaSession mediaSession;
};

interface MediaSession {
  attribute MediaMetadata? metadata;
};

The mediaSession attribute is to retrive an instance of the MediaSession interface. The attribute MUST return the MediaSession instance associated with the Navigator object.

MediaSession objects are simply known as media sessions.

A media session has metadata, which is either a MediaMetadata object or null.

session . metadata
Returns the media session’s MediaMetadata object, if any, or null otherwise.

Can be set, to a MediaMetadata object or null.

The metadata attribute, on getting, must return the media session’s metadata.

5. The MediaMetadata interface

[Constructor(optional MediaMetadataInit init)]
interface MediaMetadata {
  readonly attribute DOMString title;
  readonly attribute DOMString artist;
  readonly attribute DOMString album;
  [SameObject] readonly attribute FrozenArray<MediaImage> artwork;
};

dictionary MediaMetadataInit {
  DOMString title = "";
  DOMString artist = "";
  DOMString album = "";
  sequence<MediaImageInit> artwork = [];
};

A MediaMetadata object has a title, an artist, an album and a FrozenArray of artwork images.

The MediaMetadata(init) constructor, when invoked, must run the following steps:

  1. Let metadata be a new MediaMetadata object.
  2. Set metadata’s title to init’s title.
  3. Set metadata’s artist to init’s artist.
  4. Set metadata’s album to init’s album.
  5. Set metadata’s artwork using init’s artwork by calling the MediaImage(init) constructor.
  6. Return metadata.

The title attribute must return the MediaMetadata objects’s title.

The artist attribute must return the MediaMetadata objects’s artist.

The album attribute must return the MediaMetadata objects’s album.

The artwork attribute must return the MediaMetadata objects’s artwork images, as a FrozenArray of MediaImages. The artwork attribute can be empty.

6. The MediaImage interface

[Constructor(optional MediaImageInit init)]
interface MediaImage {
  readonly attribute USVString src;
  readonly attribute DOMString sizes;
  readonly attribute DOMString type;
};

dictionary MediaImageInit {
  USVString src = "";
  DOMString sizes = "";
  DOMString type = "";
};

A MediaImage object has a source, a list of sizes, and a type.

The MediaImage(init) constructor, when invoked, must run the following steps:

  1. Let metadata be a new MediaImage object.
  2. Set metadata’s src to init’s src. If the URL is a relative URL, it must be resolved to an absolute URL using the document base URL.
  3. Set metadata’s sizes to init’s sizes.
  4. Set metadata’s type to init’s type.
  5. Return metadata.

The MediaImage src, sizes and type inspired from the image objects in Web App Manifest.

The src attribute must return the MediaImage object’s source. It is a URL from which the user agent can fetch the image’s data.

The sizes attribute must return the MediaImage object’s sizes. It follows the spec of sizes attribute in HTML link element, which is a string consisting of an unordered set of unique space-separated tokens which are ASCII case-insensitive that represents the dimensions of an image. Each keyword is either an ASCII case-insensitive match for the string "any", or a value that consists of two valid non-negative integers that do not have a leading U+0030 DIGIT ZERO (0) character and that are separated by a single U+0078 LATIN SMALL LETTER X or U+0058 LATIN CAPITAL LETTER X character. The keywords represent icon sizes in raw pixels (as opposed to CSS pixels). When multiple image objects are available, a user agent may use the value to decide which icon is most suitable for a display context (and ignore any that are inappropriate). The parsing steps for the sizes attribute must follow the parsing steps for HTML link element sizes attribute.

The type attribute must return the MediaImage object’s type. It is a hint as to the media type of the image. The purpose of this attribute is to allow a user agent to ignore images of media types it does not support.

7. Processing model

As there is a Window object per browsing context, the top-level browsing context and each nested browsing context will have an associated MediaSession object. For each tab, the user agent MUST select the MediaSession object of the top-level browsing context to represent the tab. The selected MediaSession object is called tab-level active media session.

It is still an open question whether an MediaSession object is allowed to become the tab-level active media session. See the issue on GitHub.

When the user agent has multiple tabs, the user agent MUST select the most meaningful audio-producing tab and present the tab-level active media session to the platform, which MAY be displayed in the platform UI depending on platform conventions. The most meaningful audio-producing tab is the tab which is producing the most meaningful audio to the user. The user agent SHOULD select the most meaningful audio-producing tab based on platform conventions and the preferred user experience. The most meaningful audio-producing tab can be null.

Whenever the most meaningful audio-producing tab changes or setting metadata of the most meaningful audio-producing tab, the user agent MUST run the update metadata algorithm, the steps are as follows:

  1. If the most meaningful audio-producing tab is null, unset the media metadata presented to the platform, and terminate these steps.
  2. If the metadata for the tab-level active media session of the most meaningful audio-producing tab is null, unset the media metadata presented to the platform, and terminate these steps.
  3. Update the media metadata presented to the platform to match the metadata for the tab-level active media session of the most meaningful audio-producing tab.
  4. If the user agent wants to display artwork image, it is recommended to run the fetch image algorithm.

The recommended fetch image algorithm is as follows:

  1. If there are other fetch image algorithm running, cancel existing algorithm execution instances.
  2. If metadata’s artwork of the tab-level active media session of the most meaningful audio-producing tab is empty, then terminate these steps.
  3. If the platform supports displaying media artwork, select a prefered artwork image from metadata’s artwork of the tab-level active media session of the most meaningful audio-producing tab.
  4. Fetch the prefered artwork image’s src.

    Then, in parallel:

    1. Wait for the response.
    2. If the response’s internal response’s type is default, attempt to decode the resource as image.
    3. If the image format is supported, use the image as the artwork for display in platform UI. Otherwise the fetch image algorithm fail and terminate.

If no artwork images are fetched in the fetch image algorithm, the user agent MAY have fallback behavior such as displaying an default image as artwork.

8. Examples

This section is non-normative.

window.navigator.mediaSession.metadata = new MediaMetadata({
  title: "Episode Title",
  artist: "Podcast Host",
  album: "Podcast Title",
  artwork: [{src: "podcast.jpg"}]
});

Alternatively, providing multiple artwork images in the metadata can let the user agent be able to select different artwork images for different display purposes and better fit for different screens:

window.navigator.mediaSession.metadata = new MediaMetadata({
  title: "Episode Title",
  artist: "Podcast Host",
  album: "Podcast Title",
  artwork: [
    {src: "podcast.jpg", sizes: "128x128", type: "image/jpeg"},
    {src: "podcast_hd.jpg", sizes: "256x256"},
    {src: "podcast_xhd.jpg", sizes: "1024x1024", type: "image/jpeg"},
    {src: "podcast.png", sizes: "128x128", type: "image/png"},
    {src: "podcast_hd.png", sizes: "256x256", type: "image/png"},
    {src: "podcast.ico", sizes: "128x128 256x256", type: "image/x-icon"}
  ]
});

For example, if the user agent wants to use an image as icon, it may choose "podcast.jpg" or "podcast.png" for a low-pixel-density screen, and "podcast_hd.jpg" or "podcast_hd.png" for a high-pixel-density screen. If the user agent want to use an image for lockscreen background, "podcast_xhd.jpg" will be prefered.

For playlists or chapters of an audio book, multiple media elements can share a single media session.
var audio1 = document.createElement("audio");
audio1.src = "chapter1.mp3";

var audio2 = document.createElement("audio");
audio2.src = "chapter2.mp3";

audio1.play();
audio1.addEventListener("ended", function() {
  audio2.play();
});

Because the session is shared, the metadata must be updated to reflect what is currently playing.

function updateMetadata(event) {
  window.navigator.mediaSession.metadata = new MediaMetadata({
    title: event.target == audio1 ? "Chapter 1" : "Chapter 2",
    artist: "An Author",
    album: "A Book",
    artwork: [{src: "cover.jpg"}]
  });
}

audio1.addEventListener("play", updateMetadata);
audio2.addEventListener("play", updateMetadata);

Acknowledgments

The editor would like to thank Paul Adenot, Jake Archibald, Tab Atkins, Jonathan Bailey, Marcos Caceres, Domenic Denicola, Ralph Giles, Anne van Kesteren, Tobie Langel, Michael Mahemoff, Jer Noble, Elliott Sprehn, Chris Wilson, and Jörn Zaefferer for their participation in technical discussions that ultimately made this specification possible.

Special thanks go to Philip Jägenstedt and David Vest for their help in designing every aspect of media sessions and for their seemingly infinite patience in working through the initial design issues; Jer Noble for his help in building a model that also works well within the iOS audio focus model; and Mounir Lamouri and Anton Vayvod for their early involvement, feedback and support in making this specification happen.

This standard is written by Rich Tibbett (Opera, richt@opera.com).

Per CC0, to the extent possible under law, the editors have waived all copyright and related or neighboring rights to this work.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[APPMANIFEST]
Marcos Caceres; et al. Web App Manifest. 12 September 2016. WD. URL: https://w3c.github.io/manifest/
[FETCH]
Anne van Kesteren. Fetch Standard. Living Standard. URL: https://fetch.spec.whatwg.org/
[HTML]
Ian Hickson. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[WEBIDL]
Cameron McCormack; Boris Zbarsky. WebIDL Level 1. 8 March 2016. CR. URL: https://heycam.github.io/webidl/
[WHATWG-URL]
Anne van Kesteren. URL Standard. Living Standard. URL: https://url.spec.whatwg.org/

IDL Index

[Exposed=(Window)]
partial interface Navigator {
  readonly attribute MediaSession mediaSession;
};

interface MediaSession {
  attribute MediaMetadata? metadata;
};

[Constructor(optional MediaMetadataInit init)]
interface MediaMetadata {
  readonly attribute DOMString title;
  readonly attribute DOMString artist;
  readonly attribute DOMString album;
  [SameObject] readonly attribute FrozenArray<MediaImage> artwork;
};

dictionary MediaMetadataInit {
  DOMString title = "";
  DOMString artist = "";
  DOMString album = "";
  sequence<MediaImageInit> artwork = [];
};

[Constructor(optional MediaImageInit init)]
interface MediaImage {
  readonly attribute USVString src;
  readonly attribute DOMString sizes;
  readonly attribute DOMString type;
};

dictionary MediaImageInit {
  USVString src = "";
  DOMString sizes = "";
  DOMString type = "";
};

Issues Index

It is still an open question whether an MediaSession object is allowed to become the tab-level active media session. See the issue on GitHub.