Media Capabilities

Draft Community Group Report,

This version:
https://wicg.github.io/media-capabilities/
Issue Tracking:
GitHub
Inline In Spec
Editors:
Mounir Lamouri (Google Inc.)
Participate:
Git Repository.
File an issue.
Version History:
https://github.com/wicg/media-capabilities/commits

Abstract

This specification intends to provide APIs to allow websites to make an optimal decision when picking media content for the user. The APIs will expose information about the decoding and encoding capabilities for a given format but also output capabilities to find the best match based on the device’s display.

Status of this document

This specification was published by the Web Platform Incubator Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

1. Introduction

This section is non-normative

This specification relies on exposing the following sets of properties:

2. Decoding and Encoding Capabilities

2.1. Media Configurations

2.1.1. MediaConfiguration

dictionary MediaConfiguration {
  VideoConfiguration video;
  AudioConfiguration audio;
};
dictionary MediaDecodingConfiguration : MediaConfiguration {
  required MediaDecodingType type;
};
dictionary MediaEncodingConfiguration : MediaConfiguration {
  required MediaEncodingType type;
};

The input to the decoding capabilities is represented by a MediaDecodingConfiguration dictionary and the input of the encoding capabilities by a MediaEncodingConfiguration dictionary.

For a MediaConfiguration to be a valid MediaConfiguration, audio or video MUST be present.

2.1.2. MediaDecodingType

enum MediaDecodingType {
  "file",
  "media-source",
};

A MediaDecodingConfiguration has two types:

  • file is used to represent a configuration that is meant to be used for a plain file playback.
  • media-source is used to represent a configuration that is meant to be used for playback of a MediaSource as defined in the [media-source] specification.

2.1.3. MediaEncodingType

enum MediaEncodingType {
  "record",
  "transmission"
};

A MediaEncodingConfiguration can have one of two types:

2.1.4. MIME types

In the context of this specification, a MIME type is also called content type. A valid media MIME type is a string that is a valid MIME type per [mimesniff]. If the MIME type does not imply a codec, the string MUST also have one and only one parameter that is named codecs with a value describing a single media codec. Otherwise, it MUST contain no parameters.

A valid audio MIME type is a string that is valid media MIME type and for which the type per [RFC7231] is either audio or application.

A valid video MIME type is a string that is a valid media MIME type and for which the type per [RFC7231] is either video or application.

2.1.5. VideoConfiguration

dictionary VideoConfiguration {
  required DOMString contentType;
  required unsigned long width;
  required unsigned long height;
  required unsigned long long bitrate;
  required DOMString framerate;
};

The contentType member represents the MIME type of the video track.

To check if a VideoConfiguration configuration is a valid video configuration, the following steps MUST be run:

  1. If configuration’s contentType is not a valid video MIME type, return false and abort these steps.
  2. If none of the following is true, return false and abort these steps:
  3. Return true.

The width and height members represent respectively the visible horizontal and vertical encoded pixels in the encoded video frames.

The bitrate member represents the average bitrate of the video track given in units of bits per second. In the case of a video stream encoded at a constant bit rate (CBR) this value should be accurate over a short term window. For the case of variable bit rate (VBR) encoding, this value should be usable to allocate any necessary buffering and throughput capability to provide for the un-interrupted decoding of the video stream over the long-term based on the indicated contentType.

The framerate member represents the framerate of the video track. The framerate is the number of frames used in one second (frames per second). It is represented either as a double or as a fraction.

2.1.6. AudioConfiguration

dictionary AudioConfiguration {
  required DOMString contentType;
  DOMString channels;
  unsigned long long bitrate;
  unsigned long samplerate;
};

The contentType member represents the MIME type of the audio track.

To check if a AudioConfiguration configuration is a valid audio configuration, the following steps MUST be run:

  1. If configuration’s contentType is not a valid audio MIME type, return false and abort these steps.
  2. Return true.

The channels member represents the audio channels used by the audio track.

The channels needs to be defined as a double (2.1, 4.1, 5.1, ...), an unsigned short (number of channels) or as an enum value. The current definition is a placeholder.

The bitrate member represents the number of average bitrate of the audio track. The bitrate is the number of bits used to encode a second of the audio track.

The samplerate represents the samplerate of the audio track in. The samplerate is the number of samples of audio carried per second.

The samplerate is expressed in Hz (ie. number of samples of audio per second). Sometimes the samplerates value are expressed in kHz which represents the number of thousands of samples of audio per second.
44100 Hz is equivalent to 44.1 kHz.

2.2. Media Capabilities Information

interface MediaCapabilitiesInfo {
  readonly attribute boolean supported;
  readonly attribute boolean smooth;
  readonly attribute boolean powerEfficient;
};

The MediaCapabilitiesInfo has an associated configuration which is a MediaDecodingConfiguration or MediaEncodingConfiguration.

A MediaCapabilitiesInfo has associated supported, smooth, powerEfficient fields which are booleans.

When the create a MediaCapabilitiesInfo algorithm with a configuration is invoked, the user agent MUST run the following steps:

  1. Let info be a new MediaCapabilitiesInfo instance. Unless stated otherwise, reading and writing apply to info for the next steps.
  2. Set configuration to configuration.
  3. If configuration is of type MediaDecodingConfiguration, run the following substeps:
    1. If the user agent is able to decode the media represented by configuration, set supported to true. Otherwise set it to false.
    2. If the user agent is able to decode the media represented by configuration at a pace that allows a smooth playback, set smooth to true. Otherwise set it to false.
    3. If the user agent is able to decode the media represented by configuration in a power efficient manner, set powerEfficient to true. Otherwise set it to false. The user agent SHOULD NOT take into consideration the current power source in order to determine the decoding power efficiency unless the device’s power source has side effects such as enabling different decoding modules.
  4. If configuration is of type MediaEncodingConfiguration, run the following substeps:
    1. If the user agent is able to encode the media represented by configuration, set supported to true. Otherwise set it to false.
    2. If the user agent is able to encode the media represented by configuration at a pace that allows encoding frames at the same pace as they are sent to the encoder, set smooth to true. Otherwise set it to false.
    3. If the user agent is able to encode the media represented by configuration in a power efficient manner, set powerEfficient to true. Otherwise set it to false. The user agent SHOULD NOT take into consideration the current power source in order to determine the encoding power efficiency unless the device’s power source has side effects such as enabling different encoding modules.
  5. Return info.

The supported attribute MUST return supported.

The smooth attribute MUST return smooth.

The powerEfficient attribute MUST return powerEfficient.

Authors can use powerEfficient in concordance with the Battery Status API [battery-status] in order to determine whether the media they would like to play is appropriate for the user configuration. It is worth noting that even when a device is not power constrained, high power usage has side effects such as increasing the temperature or the fans noise.

[Exposed=(Window)]
partial interface Navigator {
  [SameObject] readonly attribute MediaCapabilities mediaCapabilities;
};
[Exposed=(Worker)]
partial interface WorkerNavigator {
  [SameObject] readonly attribute MediaCapabilities mediaCapabilities;
};

2.4. Media Capabilities Interface

[Exposed=(Window, Worker)]
interface MediaCapabilities {
  Promise<MediaCapabilitiesInfo> decodingInfo(MediaDecodingConfiguration configuration);
  Promise<MediaCapabilitiesInfo> encodingInfo(MediaEncodingConfiguration configuration);
};

The decodingInfo() method and the encodingInfo() method MUST run the following steps:

  1. If configuration is not a valid MediaConfiguration, return a Promise rejected with a TypeError.
  2. If configuration.video is present and is not a valid video configuration, return a Promise rejected with a TypeError.
  3. If configuration.audio is present and is not a valid audio configuration, return a Promise rejected with a TypeError.
  4. Let p be a new promise.
  5. In parallel, run the create a MediaCapabilitiesInfo algorithm with configuration and resolve p with its result.
  6. Return p.

3. Display Capabilities

3.1. Screen Luminance

interface ScreenLuminance {
  readonly attribute double min;
  readonly attribute double max;
  readonly attribute double maxAverage;
};

The ScreenLuminance object represents the known luminance characteristics of the screen.

The min attribute MUST return the minimal screen luminance that a pixel of the screen can emit in candela per square metre. The minimal screen luminance is the luminance used when showing the darkest color a pixel on the screen can display.

The max attribute MUST return the maximal screen luminance that a pixel of the screen can emit in candela per square metre. The maximal screen luminance is the luminance used when showing the whitest color a pixel on the screen can display.

The maxAverage attribute MUST return the maximal average screen luminance that the screen can emit in candela per square metre. The maximal average screen luminance is the maximal luminance value such as all the pixels of the screen emit the same luminance. The value returned by maxAverage is expected to be different from max as screens usually can’t apply the maximal screen luminance to the entire panel.

3.2. Screen Color Gamut

enum ScreenColorGamut {
  "srgb",
  "p3",
  "rec2020",
};

The ScreenColorGamut represents the color gamut supported by a Screen, that means the range of color that the screen can display.

The ScreenColorGamut values are:

  • srgb, it represents the [sRGB] color gamut.
  • p3, it represents the DCI P3 Color Space color gamut. This color gamut includes the srgb gamut.
  • rec2020, it represents the ITU-R Recommendation BT.2020 color gamut. This color gamut includes the p3 gamut.

3.3. Screen Color Depth

The screen color depth of a given screen is the the number of bits used to represent a color on the screen. Most screens will return 24. Screen able to represent wider color range will encode bits in more than 24 bits.

3.4. Screen extension

Part of this section is 🐵 patching of the CSSOM View Module. Issue #4 is tracking merging the changes. This partial interface requires the Screen interface to become an EventTarget.

partial interface Screen {
  readonly attribute ScreenColorGamut colorGamut;
  readonly attribute ScreenLuminance? luminance;

  attribute EventHandler onchange;
};

The colorGamut attribute SHOULD return the ScreenColorGamut approximately supported by the screen. In other words, the screen does not need to fully support the given color gamut but needs to be close enough. If the user agent does not know the color gamut supported by the screen, if the supported color gamut is lower than srgb, or if the user agent does not want to expose this information for privacy consideration, it SHOULD return srgb as a default value. The value returned by colorGamut MUST match the value returned by the color-gamut CSS media query.

The luminance attribute SHOULD return a ScreenLuminance object that will expose the luminance characteristics of the screen. If the user agent has no access to the luminance characteristics of the screen, it MUST return null. The user agent MAY also return null if it does not want to expose the luminance information for privacy reasons.

The onchange attribute is an event handler whose corresponding event handler event type is change.

Whenever the user agent is aware that the state of the Screen object has changed, that is if one the value exposed on the Screen object or in an object exposed on the Screen object, it MUST queue a task to fire an event named change on Screen.

4. Security and Privacy Considerations

This specification does not introduce any security-sensitive information or APIs but is provides an easier access to some information that can be used to fingerprint users.

4.1. Decoding/Encoding and Fingerprinting

The information exposed by the decoding/encoding capabilities can already be discovered via experimentation with the exception that the API will likely provide more accurate and consistent information. This information is expected to have a high correlation with other information already available to the web pages as a given class of device is expected to have very similar decoding/encoding capabilities. In other words, high end devices from a certain year are expected to decode some type of videos while older devices may not. Therefore, it is expected that the entropy added with this API isn’t going to be significant.

If an implementation wishes to implement a fingerprint-proof version of this specification, it would be recommended to fake a given set of capabilities (ie. decode up to 1080p VP9, etc.) instead of returning always yes or always no as the latter approach could considerably degrade the user’s experience.

4.2. Display and Fingerprinting

The information exposed by the display capabilities can already be accessed via CSS for the most part. The specification also provides default values when the user agent does not which to expose the feature for privacy reasons.

5. Examples

5.1. Query recording capabilities with encodingInfo()

The following example can also be found in e.g. this codepen with minimal modifications.
<script>
  const configuration = {
      type : 'record',
      video : {
        contentType : 'video/webm;codecs=vp8',
        width : 640,
        height : 480,
        bitrate : 10000,
        framerate : '30'
    }
  };
  navigator.mediaCapabilities.encodingInfo(configuration)
      .then((result) => {
        console.log(result.contentType + ' is:'
            + (result.supported ? '' : ' NOT') + ' supported,'
            + (result.smooth ? '' : ' NOT') + ' smooth and'
            + (result.powerEfficient ? '' : ' NOT') + ' power efficient');
      })
      .catch((err) => {
        console.error(err, ' caused encodingInfo to throw');
      });
</script>

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[CSSOM-VIEW-1]
Simon Pieters. CSSOM View Module. 17 March 2016. WD. URL: https://www.w3.org/TR/cssom-view-1/
[DOM]
Anne van Kesteren. DOM Standard. Living Standard. URL: https://dom.spec.whatwg.org/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[MEDIA-SOURCE]
Matthew Wolenetz; et al. Media Source Extensions™. 17 November 2016. REC. URL: https://www.w3.org/TR/media-source/
[MEDIAQUERIES-4]
Florian Rivoal; Tab Atkins Jr.. Media Queries Level 4. 5 September 2017. CR. URL: https://www.w3.org/TR/mediaqueries-4/
[MIMESNIFF]
Gordon P. Hemsley. MIME Sniffing Standard. Living Standard. URL: https://mimesniff.spec.whatwg.org/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[WebIDL]
Cameron McCormack; Boris Zbarsky; Tobie Langel. Web IDL. 15 December 2016. ED. URL: https://heycam.github.io/webidl/

Informative References

[BATTERY-STATUS]
Anssi Kostiainen; Mounir Lamouri. Battery Status API. 7 July 2016. CR. URL: https://www.w3.org/TR/battery-status/
[MEDIA-PLAYBACK-QUALITY]
Media Playback Quality Specification. CG-DRAFT. URL: https://wicg.github.io/media-playback-quality/
[MEDIASTREAM-RECORDING]
Miguel Casas-sanchez; James Barnett; Travis Leithead. MediaStream Recording. 21 June 2017. WD. URL: https://www.w3.org/TR/mediastream-recording/
[RFC7231]
R. Fielding, Ed.; J. Reschke, Ed.. Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content. June 2014. Proposed Standard. URL: https://tools.ietf.org/html/rfc7231
[sRGB]
Multimedia systems and equipment - Colour measurement and management - Part 2-1: Colour management - Default RGB colour space - sRGB. URL: https://webstore.iec.ch/publication/6169

IDL Index

dictionary MediaConfiguration {
  VideoConfiguration video;
  AudioConfiguration audio;
};

dictionary MediaDecodingConfiguration : MediaConfiguration {
  required MediaDecodingType type;
};

dictionary MediaEncodingConfiguration : MediaConfiguration {
  required MediaEncodingType type;
};

enum MediaDecodingType {
  "file",
  "media-source",
};

enum MediaEncodingType {
  "record",
  "transmission"
};

dictionary VideoConfiguration {
  required DOMString contentType;
  required unsigned long width;
  required unsigned long height;
  required unsigned long long bitrate;
  required DOMString framerate;
};

dictionary AudioConfiguration {
  required DOMString contentType;
  DOMString channels;
  unsigned long long bitrate;
  unsigned long samplerate;
};

interface MediaCapabilitiesInfo {
  readonly attribute boolean supported;
  readonly attribute boolean smooth;
  readonly attribute boolean powerEfficient;
};

[Exposed=(Window)]
partial interface Navigator {
  [SameObject] readonly attribute MediaCapabilities mediaCapabilities;
};

[Exposed=(Worker)]
partial interface WorkerNavigator {
  [SameObject] readonly attribute MediaCapabilities mediaCapabilities;
};

[Exposed=(Window, Worker)]
interface MediaCapabilities {
  Promise<MediaCapabilitiesInfo> decodingInfo(MediaDecodingConfiguration configuration);
  Promise<MediaCapabilitiesInfo> encodingInfo(MediaEncodingConfiguration configuration);
};

interface ScreenLuminance {
  readonly attribute double min;
  readonly attribute double max;
  readonly attribute double maxAverage;
};

enum ScreenColorGamut {
  "srgb",
  "p3",
  "rec2020",
};

partial interface Screen {
  readonly attribute ScreenColorGamut colorGamut;
  readonly attribute ScreenLuminance? luminance;

  attribute EventHandler onchange;
};

Issues Index

The channels needs to be defined as a double (2.1, 4.1, 5.1, ...), an unsigned short (number of channels) or as an enum value. The current definition is a placeholder.
Part of this section is 🐵 patching of the CSSOM View Module. Issue #4 is tracking merging the changes. This partial interface requires the Screen interface to become an EventTarget.