WebCodecs

Draft Community Group Report,

This version:
https://wicg.github.io/web-codecs/
Issue Tracking:
GitHub
Editors:
Chris Cunningham (Google Inc.)
Paul Adenot (Mozilla)
Participate:
Git Repository.
File an issue.
Version History:
https://github.com/wicg/web-codecs/commits

Abstract

This specification defines interfaces for encoding and decoding audio and video. It also includes an interface for retrieving raw video frames from MediaStreamStracks.

Status of this document

This specification was published by the Web Platform Incubator Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

1. Definitions

Codec

Refers generically to the types: AudioDecoder, AudioEncoder, VideoDecoder, and VideoEncoder.

Key Frame

An encoded frame that does not depend on any other frames for decoding.

2. Codec Processing Model

New codec tasks may be scheduled while previous tasks are still pending. For example, web authors may call decode() without waiting for the previous decode() to generate an output. This is facilitated by the following mechanisms.

Each codec has a single control message queue that is a list of control messages.

Queuing a control message means adding the message to the end of a codec’s control message queue. Invoking codec methods will often queue a control message to schedule work.

Running a control message means executing a sequence of steps specified by the method that enqueued the message.

Control messages in a control message queue are ordered by time of insertion. The oldest message is therefore the one at the front of the control message queue.

Running the control message processing loop means executing these steps.

  1. While the control message queue is not empty

    1. Let front message be the next oldest control message

    2. If front message cannot be executed now, return.

      The User Agent must decide when further processing is blocked because of ongoing work as an implementation detail (e.g. the underlying decoder cannot accept more requests yet). The UA must restart the processing loop when the blockage is resolved.

      NOTE: a blocked processing loop is visible to authors via the decodeQueueSize and encodeQueueSize attributes.

    3. Dequeue front message from the control message queue.

    4. Run the front message control message steps.

3. AudioDecoder Interface

[Exposed=(Window,Worker)]
interface AudioDecoder {
  constructor(AudioDecoderInit init);

  readonly attribute CodecState state;
  readonly attribute long decodeQueueSize;

  undefined configure(AudioDecoderConfig config);
  undefined decode(EncodedAudioChunk chunk);
  Promise<undefined> flush();
  undefined reset();
  undefined close();
};

dictionary AudioDecoderInit {
  required AudioFrameOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback AudioFrameOutputCallback = undefined(AudioFrame output);

3.1. Internal Slots

[[codec implementation]]
Underlying decoder implementation provided by the User Agent.
[[output callback]]
Callback given at construction for decoded outputs.
[[error callback]]
Callback given at construction for decode errors.

3.2. Constructors

AudioDecoder(init)
  1. Let d be a new AudioDecoder object.

  2. Assign init.output to the [[output callback]] internal slot.

  3. Assign init.error to the [[error callback]] internal slot.

  4. Assign "unconfigured" to d.state.

  5. Return d.

3.3. Attributes

state, of type CodecState, readonly
Describes the current state of the codec.
decodeQueueSize, of type long, readonly
The number of pending decode requests. This does not include requests that have been sent to the underlying codec.

3.4. Methods

configure(config)
Enqueues a control message to configure the audio decoder for decoding chunks as described by config.

When invoked, run these steps:

  1. If config is not a valid AudioDecoderConfig, throw a TypeError.

  2. Run the Configure Decoder algorithm with config.

decode(chunk)
Enqueues a control message to decode the given chunk.

When invoked, run these steps:

  1. Let output algorithm be the AudioFrame Output algorithm.

  2. Run the Decode Chunk algorithm with chunk and output algorithm.

flush()
Completes all control messages in the control message queue and emits all outputs.

When invoked, run these steps:

  1. Let output algorithm be the AudioFrame Output algorithm.

  2. Run the Flush algorithm with output algorithm.

reset()
Immediately resets all state including configuration, control messages in the control message queue, and all pending callbacks.

When invoked, run the Reset algorithm.

close()
Immediately aborts all pending work and releases system resources. Close is permanent.

When invoked, run the Close algorithm.

4. VideoDecoder Interface

[Exposed=(Window,Worker)]
interface VideoDecoder {
  constructor(VideoDecoderInit init);

  readonly attribute CodecState state;
  readonly attribute long decodeQueueSize;

  undefined configure(VideoDecoderConfig config);
  undefined decode(EncodedVideoChunk chunk);
  Promise<undefined> flush();
  undefined reset();
  undefined close();
};

dictionary VideoDecoderInit {
  required VideoFrameOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback VideoFrameOutputCallback = undefined(VideoFrame output);

4.1. Internal Slots

[[codec implementation]]
Underlying decoder implementation provided by the User Agent.
[[output callback]]
Callback given at construction for decoded outputs.
[[error callback]]
Callback given at construction for decode errors.

4.2. Constructors

VideoDecoder(init)
  1. Let d be a new VideoDecoder object.

  2. Assign init.output to the [[output callback]] internal slot.

  3. Assign init.error to the [[error callback]] internal slot.

  4. Assign "unconfigured" to d.state.

  5. Return d.

4.3. Attributes

state, of type CodecState, readonly
Describes the current state of the codec.
decodeQueueSize, of type long, readonly
The number of pending decode requests. This does not include requests that have been sent to the underlying codec.

4.4. Methods

configure(config)
Enqueues a control message to configure the video decoder for decoding chunks as described by config.

When invoked, run these steps:

  1. If config is not a valid VideoDecoderConfig, throw a TypeError.

  2. Run the Configure Decoder algorithm with config.

decode(chunk)
Enqueues a control message to decode the given chunk.

When invoked, run these steps:

  1. Let output algorithm be the VideoFrame Output algorithm.

  2. Run the Decode Chunk algorithm with chunk and output algorithm.

flush()
Completes all control messages in the control message queue and emits all outputs.

When invoked, run these steps:

  1. Let output algorithm be the VideoFrame Output algorithm.

  2. Run the Flush algorithm with output algorithm.

reset()
Immediately resets all state including configuration, control messages in the control message queue, and all pending callbacks.

When invoked, run the Reset algorithm.

close()
Immediately aborts all pending work and releases system resources. Close is permanent.

When invoked, run the Close algorithm.

5. AudioEncoder Interface

[Exposed=(Window,Worker)]
interface AudioEncoder {
  constructor(AudioEncoderInit init);
  readonly attribute CodecState state;
  readonly attribute long encodeQueueSize;
  undefined configure(AudioEncoderConfig config);
  undefined encode(AudioFrame frame);
  Promise<undefined> flush();
  undefined reset();
  undefined close();
};

dictionary AudioEncoderInit {
  required EncodedAudioChunkOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback EncodedAudioChunkOutputCallback = undefined(EncodedAudioChunk output);

5.1. Internal Slots

[[codec implementation]]
Underlying encoder implementation provided by the User Agent.
[[output callback]]
Callback given at construction for encoded outputs.
[[error callback]]
Callback given at construction for encode errors.

5.2. Constructors

AudioEncoder(init)
  1. Let e be a new AudioEncoder object.

  2. Assign init.output to the [[output callback]] internal slot.

  3. Assign init.error to the [[error callback]] internal slot.

  4. Assign "unconfigured" to e.state.

  5. Return e.

5.3. Attributes

state, of type CodecState, readonly
Describes the current state of the codec.
encodeQueueSize, of type long, readonly
The number of pending encode requests. This does not include requests that have been sent to the underlying codec.

5.4. Methods

configure(config)
Enqueues a control message to configure the audio encoder for decoding chunks as described by config.

When invoked, run these steps:

  1. If config is not a valid AudioEncoderConfig, throw a TypeError.

  2. Run the Configure Encoder algorithm with config.

encode(frame)
Enqueues a control message to encode the given frame.

NOTE: This method will destroy the VideoFrame. Authors who wish to retain a copy, should call frame.clone() prior to calling encode().

When invoked, run these steps:

  1. If the value of frame’s [[detached]] internal slot is true, throw a TypeError.

  2. Let output algorithm be the EncodedAudioChunk Output algorithm.

  3. Run the Encode Frame algorithm with frame and output algorithm.

flush()
Completes all control messages in the control message queue and emits all outputs.

When invoked, run these steps:

  1. Let output algorithm be the EncodedAudioChunk Output algorithm.

  2. Run the Flush algorithm with output algorithm.

reset()
Immediately resets all state including configuration, control messages in the control message queue, and all pending callbacks.

When invoked, run the Reset algorithm.

close()
Immediately aborts all pending work and releases system resources. Close is permanent.

When invoked, run the Close algorithm.

6. VideoEncoder Interface

[Exposed=(Window,Worker)]
interface VideoEncoder {
  constructor(VideoEncoderInit init);
  readonly attribute CodecState state;
  readonly attribute long encodeQueueSize;
  undefined configure(VideoEncoderConfig config);
  undefined encode(VideoFrame frame, optional VideoEncoderEncodeOptions options = {});
  Promise<undefined> flush();
  undefined reset();
  undefined close();
};

dictionary VideoEncoderInit {
  required EncodedVideoChunkOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback EncodedVideoChunkOutputCallback = undefined(EncodedVideoChunk output);

6.1. Internal Slots

[[codec implementation]]
Underlying encoder implementation provided by the User Agent.
[[output callback]]
Callback given at construction for encoded outputs.
[[error callback]]
Callback given at construction for encode errors.

6.2. Constructors

VideoEncoder(init)
  1. Let e be a new VideoEncoder object.

  2. Assign init.output to the [[output callback]] internal slot.

  3. Assign init.error to the [[error callback]] internal slot.

  4. Assign "unconfigured" to e.state.

  5. Return e.

6.3. Attributes

state, of type CodecState, readonly
Describes the current state of the codec.
encodeQueueSize, of type long, readonly
The number of pending encode requests. This does not include requests that have been sent to the underlying codec.

6.4. Methods

configure(config)
Enqueues a control message to configure the video encoder for decoding chunks as described by config.

When invoked, run these steps:

  1. If config is not a valid VideoEncoderConfig, throw a TypeError.

  2. Run the Configure Encoder algorithm with config.

encode(frame, options)
Enqueues a control message to encode the given frame.

NOTE: This method will destroy the VideoFrame. Authors who wish to retain a copy, should call frame.clone() prior to calling encode().

When invoked, run these steps:

  1. If the value of frame’s [[detached]] internal slot is true, throw a TypeError.

  2. Let output algorithm be the EncodedVideoChunk Output algorithm.

  3. Run the Encode Frame algorithm with frame and output algorithm.

flush()
Completes all control messages in the control message queue and emits all outputs.

When invoked, run these steps:

  1. Let output algorithm be the EncodedVideoChunk Output algorithm.

  2. Run the Flush algorithm with output algorithm.

reset()
Immediately resets all state including configuration, control messages in the control message queue, and all pending callbacks.

When invoked, run the Reset algorithm.

close()
Immediately aborts all pending work and releases system resources. Close is permanent.

When invoked, run the Close algorithm.

7. Decoder and Encoder Algorithms

The following algorithms run in the scope of the methods that invoke them. Mentions of attributes and internal slots refer to members of the interface that owns the invoking method.

7.1. Configure Decoder

Given either an AudioDecoderConfig or VideoDecoderConfig config, this algorithm attempts to select a codec implementation that supports config.

Run the following steps:

  1. If state is “closed”, throw an InvalidStateError.

  2. If the user agent cannot provide a codec implementation to support config, throw a NotSupportedError.

  3. Set state to "configured".

  4. Queue a control message to configure the decoder with config.

  5. Run the control message processing loop.

Running a control message to configure the decoder means running these steps:

  1. Assign [[codec implementation]] with an implementation supporting config.

7.2. Decode Chunk (with chunk and output algorithm)

Run these steps:
  1. If state is not "configured", throw an InvalidStateError.

  2. Increment decodeQueueSize.

  3. Queue a control message to decode the chunk.

  4. Run the control message processing loop.

Running a control message to decode the chunk means running these steps:

  1. Decrement decodeQueueSize

  2. Let codec implementation queue be the result of starting a new parallel queue.

  3. Enqueue the following steps to codec implementation queue:

    1. Attempt to use [[codec implementation]] to decode the chunk.

    2. If decoding results in an error, queue a task on the media element task source to run the Codec Error algorithm.

  4. Otherwise, for each output, queue a task on the media element task source to run the provided output algorithm.

7.3. Flush

Given an output algorithm, this algorithm flushes all pending outputs to the output callback.

Run these steps:

  1. If state is not "configured", return a Promise rejected with a newly created InvalidStateError.

  2. Let promise be a new Promise.

  3. Queue a control message to flush the codec with promise and output algorithm

  4. Return promise.

Running a control message to flush the codec means running these steps with promise and output algorithm.

  1. Signal [[codec implementation]] to emit all pending outputs.

  2. For each output, run output algorithm.

  3. Resolve promise.

7.4. Codec Error

This algorithm fires the error callback and permanently closes the codec.

Run these steps:

  1. Cease processing of control message queue.

  2. Run the Close algorithm with EncodingError.

7.5. AudioFrame Output

Run these steps:
  1. If state is not “configured”, abort the following steps.

  2. Let buffer be an AudioBuffer containing the decoded audio data.

  3. Let frame be an AudioFrame containing buffer and a timestamp for the output.

  4. Invoke [[output callback]] with frame.

7.6. VideoFrame Output

Run these steps:
  1. If state is not “configured”, abort the following steps.

  2. Let planes be a sequence of Planes containing the decoded video frame data.

  3. Let pixelFormat be the PixelFormat of planes.

  4. Let frameInit be a VideoFrameInit with the following keys:

    1. Let timestamp and duration be the presentation timestamp and duration from the EncodedVideoChunk associated with this output.

    2. Let codedWidth and codedHeight be the width and height of the decoded video frame in pixels, prior to any cropping or aspect ratio adjustments.

    3. Let cropLeft, cropTop, cropWidth, and cropHeight be the crop region of the decoded video frame in pixels, prior to any aspect ratio adjustments.

    4. Let displayWidth and displayHeight be the display size of the decoded video frame in pixels.

  5. Let frame be a VideoFrame, constructed with pixelFormat, planes, and frameInit.

  6. Invoke [[output callback]] with frame.

7.7. Reset

Run these steps:
  1. If state is “closed”, throw an InvalidStateError.

  2. Set state to “unconfigured”.

  3. Signal [[codec implementation]] to cease producing output for the previous configuration.

NOTE: Some tasks to emit outputs may already be queued in the event loop. These outputs will be dropped by the output algorithms, which abort if state is not “configured”.

  1. For each control message in the control message queue:

    1. If a control message has an associated promise, reject the promise.

    2. Remove the message from the queue.

7.8. Close (with error)

Run these steps:
  1. Run the Reset algorithm.

  2. Set state to “closed”.

  3. Clear [[codec implementation]] and release associated system resources.

  4. If error is set, invoke [[error callback]] with error.

7.9. Configure Encoder (with config)

Run the following steps:
  1. If state is "closed", throw an InvalidStateError.

  2. If the user agent cannot provide a codec implementation to support config, throw a NotSupportedError.

  3. Set state to "configured".

  4. Queue a control message to configure the encoder using config.

  5. Run the control message processing loop.

Running a control message to configure the encoder means running these steps:

  1. Assign [[codec implementation]] with an implementation supporting config.

7.10. Encode Frame (with frame, options, and output algorithm)

Run these steps:
  1. If state is not "configured", throw an InvalidStateError.

  2. If the value of frame’s **[[detached]]** internal slot is true, throw a TypeError.

  3. Let frameClone hold the result of running the Clone Frame algorithm with frame.

  4. Destroy the original frame by invoking frame.destroy().

  5. Increment encodeQueueSize.

  6. Queue a control message to encode frameClone with options and output algorithm.

  7. Run the control message processing loop.

Running a control message to encode the frame means running these steps.

  1. Decrement encodeQueueSize.

  2. Let codec implementation queue be the result of starting a new parallel queue.

  3. Enqueue the following steps to codec implementation queue:

    1. Attempt to use [[codec implementation]] and options to encode frameClone.

    2. If encoding results in an error, queue a task on the media element task source to run the codec error algorithm.

    3. Otherwise, for each output, queue a task on the media element task source to run the provided output algorithm.

7.11. EncodedAudioChunk Output

Run these steps:
  1. If state is not “configured”, abort the following steps.

  2. Let chunkInit be an EncodedAudioChunkInit with the following keys:

    1. Let data contain the encoded audio data.

    2. Let type be the EnocdedAudioChunkType of the encoded audio data.

    3. Let timestamp be the timestamp from the associated input AudioFrame.

    4. Let duration be the duration from the associated input AudioFrame.

  3. Let chunk be a new EncodedAudioChunk constructed with chunkInit.

  4. Invoke [[output callback]] with chunk.

7.12. EncodedVideoChunk Output

Run these steps:
  1. If state is not “configured”, abort the following steps.

  2. Let chunkInit be an EncodedVideoChunkInit with the following keys:

    1. Let data contain the encoded video data.

    2. Let type be the EncodedVideoChunkType of the encoded video data.

    3. Let timestamp be the timestamp from the associated input VideoFrame.

    4. Let duration be the duration from the associated input VideoFrame.

  3. Let chunk be a new EncodedVideoChunk constructed with chunkInit.

  4. Invoke [[output callback]] with chunk.

8. Configurations

8.1. Codec String

In other media specifications, codec strings historically accompanied [[mime types]] as the “codecs=” parameter (isTypeSupported(), canPlayType()). In this specification, encoded media is not containerized; hence, only the value of the codecs parameter is accepted.

A valid codec string must meet the following conditions.

  1. Is valid per the relevant codec specification (see examples below).

NOTE: This needs more work. We might consider a registry of specs/strings.

  1. It describes a single codec.

NOTE: Not a comma separated list.

  1. It is unambiguous about codec profile and level for codecs that define these concepts.

NOTE: There is no unified specification for codec strings. Each codec has its own unique string format, specified by the authors of the codec. Relevant specifications include:
NOTE: Some valid examples include: 'vp8', 'vp09.00.10.08', 'avc1.4D401E', 'opus', 'mp4a.40.2', 'flac'

Invalid examples include:

8.2. AudioDecoderConfig

dictionary AudioDecoderConfig {
  required DOMString codec;
  required unsigned long sampleRate;
  required unsigned long numberOfChannels;
  BufferSource description;
};

To check if an AudioDecoderConfig is a valid AudioDecoderConfig, run these steps:

  1. If codec is not a valid codec string, return false.

  2. Return true.

codec, of type DOMString
Contains a codec string describing the codec.
sampleRate, of type unsigned long
The number of frame samples per second.
numberOfChannels, of type unsigned long
The number of audio channels.
description, of type BufferSource
A sequence of codec specific bytes, commonly known as extradata.

NOTE: For example, the vorbis “code book”.

8.3. VideoDecoderConfig

dictionary VideoDecoderConfig {
  required DOMString codec;
  BufferSource description;
  required unsigned long codedWidth;
  required unsigned long codedHeight;
  unsigned long cropLeft;
  unsigned long cropTop;
  unsigned long cropWidth;
  unsigned long cropHeight;
  unsigned long displayWidth;
  unsigned long displayHeight;
};

To check if a VideoDecoderConfig is a valid VideoDecoderConfig, run these steps:

  1. If codec is not a valid codec string, return false.

  2. If codedWidth = 0 or codedHeight = 0, return false.

  3. If cropWidth = 0 or cropHeight = 0, return false.

  4. If cropTop + cropHeight >= codedHeight, return false.

  5. If cropLeft + cropWidth >= codedWidth, return false.

  6. If displayWidth = 0 or displayHeight = 0, return false.

  7. Return true.

codec, of type DOMString
Contains a codec string describing the codec.
description, of type BufferSource
A sequence of codec specific bytes, commonly known as extradata.

NOTE: For example, the VP9 vpcC bytes.

codedWidth, of type unsigned long
Width of the VideoFrame in pixels, prior to any cropping or aspect ratio adjustments.
codedHeight, of type unsigned long
Height of the VideoFrame in pixels, prior to any cropping or aspect ratio adjustments.
cropLeft, of type unsigned long
The number of pixels to remove from the left of the VideoFrame, prior to aspect ratio adjustments. Defaults to zero if not present.
cropTop, of type unsigned long
The number of pixels to remove from the top of the VideoFrame, prior to aspect ratio adjustments. Defaults to zero if not present.
cropWidth, of type unsigned long
The width of pixels to include in the crop, starting from cropLeft. Defaults to codedWidth if not present.
cropHeight, of type unsigned long
The height of pixels to include in the crop, starting from cropLeft. Defaults to codedHeight if not present.
displayWidth, of type unsigned long
Width of the VideoFrame when displayed. Defaults to cropWidth if not present.
displayHeight, of type unsigned long
Height of the VideoFrame when displayed. Defaults to cropHeight if not present.

8.4. AudioEncoderConfig

dictionary AudioEncoderConfig {
  required DOMString codec;
  unsigned long sampleRate;
  unsigned long numberOfChannels;
};

To check if an AudioEncoderConfig is a valid AudioEncoderConfig, run these steps:

  1. If codec is not a valid codec string, return false.

  2. Return true.

codec, of type DOMString
Contains a codec string describing the codec.
sampleRate, of type unsigned long
The number of frame samples per second.
numberOfChannels, of type unsigned long
The number of audio channels.

8.5. VideoEncoderConfig

dictionary VideoEncoderConfig {
  required DOMString codec;
  unsigned long long bitrate;
  required unsigned long width;
  required unsigned long height;
};

To check if a VideoEncoderConfig is a valid VideoEncoderConfig, run these steps:

  1. If codec is not a valid codec string, return false.

  2. If width = 0 or height = 0, return false.

  3. Return true.

width, of type unsigned long
The expected cropWidth of input VideoFrames to encode.
height, of type unsigned long
The expected cropHeight of input VideoFrames to encode.

8.6. VideoEncoderEncodeOptions

dictionary VideoEncoderEncodeOptions {
  boolean keyFrame;
};

keyFrame, of type boolean
Indicates whether the given frame MUST be encoded as a key frame.

8.7. CodecState

enum CodecState {
  "unconfigured",
  "configured",
  "closed"
};

unconfigured
The codec is not configured for encoding or decoding.
configured
A valid configuration has been provided. The codec is ready for encoding or decoding.
closed
The codec is no longer usable and underlying system resources have been released.

8.8. WebCodecsErrorCallback

callback WebCodecsErrorCallback = undefined(DOMException error);

9. Encoded Media Interfaces (Chunks)

These interfaces represent chunks of encoded media.

9.1. EncodedAudioChunk Interface

interface EncodedAudioChunk {
  constructor(EncodedAudioChunkInit init);
  readonly attribute EncodedAudioChunkType type;
  readonly attribute unsigned long long timestamp;  // microseconds
  readonly attribute ArrayBuffer data;
};

dictionary EncodedAudioChunkInit {
  required EncodedAudioChunkType type;
  required unsigned long long timestamp;
  required BufferSource data;
};

enum EncodedAudioChunkType {
    "key",
    "delta",
};

9.1.1. Constructors

EncodedAudioChunk(init)
  1. Let chunk be a new EncodedAudioChunk object, initialized as follows

    1. Assign init.type to chunk.type.

    2. Assign init.timestamp to chunk.timestamp.

    3. Assign a copy of init.data to chunk.data.

  2. Return chunk.

9.1.2. Attributes

type, of type EncodedAudioChunkType, readonly
Describes whether the chunk is a key frame.
timestamp, of type unsigned long long, readonly
The presentation timestamp, given in microseconds.
data, of type ArrayBuffer, readonly
A sequence of bytes containing encoded audio data.

9.2. EncodedVideoChunk Interface

[Exposed=(Window,Worker)]
interface EncodedVideoChunk {
  constructor(EncodedVideoChunkInit init);
  readonly attribute EncodedVideoChunkType type;
  readonly attribute unsigned long long timestamp;  // microseconds
  readonly attribute unsigned long long? duration;  // microseconds
  readonly attribute ArrayBuffer data;
};

dictionary EncodedVideoChunkInit {
  required EncodedVideoChunkType type;
  required unsigned long long timestamp;
  unsigned long long duration;
  required BufferSource data;
};

enum EncodedVideoChunkType {
    "key",
    "delta",
};

9.2.1. Constructors

EncodedVideoChunk(init)
  1. Let chunk be a new EncodedVideoChunk object, initialized as follows

    1. Assign init.type to chunk type.

    2. Assign init.timestamp to chunk.timestamp.

    3. If duration is present in init, assign init.duration to chunk.duration. Otherwise, assign null to chunk.duration.

  2. Assign a copy of init.data to chunk.data.

  3. Return chunk.

9.2.2. Attributes

type, of type EncodedVideoChunkType, readonly
Describes whether the chunk is a key frame or not.
timestamp, of type unsigned long long, readonly
The presentation timestamp, given in microseconds.
duration, of type unsigned long long, readonly, nullable
The presentation duration, given in microseconds.
data, of type ArrayBuffer, readonly
A sequence of bytes containing encoded video data.

10. Raw Media Interfaces (Frames)

These interfaces represent unencoded (raw) media.

10.1. AudioFrame Interface

[Exposed=(Window,Worker)]
interface AudioFrame {
  constructor(AudioFrameInit init);
  readonly attribute unsigned long long timestamp;
  readonly attribute AudioBuffer? buffer;
  undefined close();
};

dictionary AudioFrameInit {
  required unsigned long long timestamp;
  required AudioBuffer buffer;
};

10.1.1. Internal Slots

[[detached]]
Boolean indicating whether close() was invoked and underlying resources have been released.

10.1.2. Constructors

AudioFrame(init)
  1. Let frame be a new AudioFrame object.

  2. Assign init.timestamp to frame.timestamp.

  3. Assign init.buffer to frame.buffer.

  4. Assign false to the [[detached]] internal slot.

  5. Return frame.

10.1.3. Attributes

timestamp, of type unsigned long long, readonly
The presentation timestamp, given in microseconds.
buffer, of type AudioBuffer, readonly, nullable
The buffer containing decoded audio data.

10.1.4. Methods

close()
Immediately frees system resources. When invoked, run these steps:
  1. Release system resources for buffer and set its value to null.

  2. Assign true to the [[detached]] internal slot.

NOTE: This section needs work. We should use the name and semantics of VideoFrame destroy(). Similarly, we should add clone() to make a deep copy.

10.2. VideoFrame Interface

[Exposed=(Window,Worker)]
interface VideoFrame {
  constructor(ImageBitmap imageBitmap, VideoFrameInit frameInit);
  constructor(PixelFormat pixelFormat, sequence<(Plane or PlaneInit)> planes,
              VideoFrameInit frameInit);

  readonly attribute PixelFormat format;
  readonly attribute FrozenArray<Plane> planes;
  readonly attribute unsigned long codedWidth;
  readonly attribute unsigned long codedHeight;
  readonly attribute unsigned long cropLeft;
  readonly attribute unsigned long cropTop;
  readonly attribute unsigned long cropWidth;
  readonly attribute unsigned long cropHeight;
  readonly attribute unsigned long displayWidth;
  readonly attribute unsigned long displayHeight;
  readonly attribute unsigned long long? duration;
  readonly attribute unsigned long long? timestamp;

  undefined destroy();
  VideoFrame clone();

  Promise<ImageBitmap> createImageBitmap(
    optional ImageBitmapOptions options = {});

};

dictionary VideoFrameInit {
  unsigned long codedWidth;
  unsigned long codedHeight;
  unsigned long cropLeft;
  unsigned long cropTop;
  unsigned long cropWidth;
  unsigned long cropHeight;
  unsigned long displayWidth;
  unsigned long displayHeight;
  unsigned long long duration;
  unsigned long long timestamp;
};

10.2.1. Internal Slots

[[detached]]
Boolean indicating whether destroy() was invoked and underlying resources have been released.

10.2.2. Constructors

NOTE: this section needs work. Current wording assumes a VideoFrame can always be easily represented using one of the known pixel formats. In practice, the underlying UA resources may be GPU backed or formatted in such a way that conversion to an allowed pixel format requires expensive copies and translation. When this occurs, we should allow planes to be null and format to be “opaque” to avoid early optimization. We should make conversion explicit and user controlled by offering a videoFrame.convertTo(format) that returns a Promise containing a new VideoFrame for which the copies/translations are performed.

VideoFrame(imageBitmap, frameInit)

  1. If frameInit is not a valid VideoFrameInit, throw a TypeError.

  2. If the value of imageBitmap’s' [[Detached]] internal slot is set to true, then throw an InvalidStateError DOMException.

  3. Let frame be a new VideoFrame.

  4. Assign false to frame’s [[detached]] internal slot.

  5. Use a copy of the pixel data in imageBitmap to initialize to following frame attributes:

    1. Initialize frame.pixelFormat be the underlying format of imageBitmap.

    2. Initialize frame.planes to describe the arrangement of memory of the copied pixel data.

    3. Assign regions of the copied pixel data to the [[plane buffer]] internal slot of each plane as appropriate for the pixel format.

    4. Initialize frame.codedWidth and frame.codedHeight describe the width and height of the imageBitamp prior to any cropping or aspect ratio adjustments.

  6. Use frameInit to initialize the remaining frame attributes:

    1. If frameInit.cropLeft is present, initialize it frame.cropLeft. Otherwise, default frame.cropLeft to zero.

    2. If frameInit.cropTop is present, initialize it to frame.cropTop. Otherwise, default frame.cropTop to zero.

    3. If frameInit.cropWidth is present, initialize it to frame.cropWidth. Otherwise, default frame.cropWidth to frame.codedWidth.

    4. If frameInit.cropHeight is present, initialize it to frame.cropHeight. Otherwise, default frame.cropHeight to frame.codedHeight.

    5. If frameInit.displayWidth is present, initialize it to frame.displayWidth. Otherwise, default frame.displayWidth to frame.codedWidth.

    6. If frameInit.displayHeight is present, initialize it to frame.displayHeight. Otherwise, default frame.displayHeight to frame.codedHeight.

    7. If frameInit.duration is present, initialize it to frame.duration. Otherwise, default frame.duration to null.

    8. If frameInit.timestamp is present, initialize it to frame.timestamp. Otherwise default frame.timestamp to null.

  7. Return frame.

VideoFrame(pixelFormat, planes, frameInit)

  1. If either codedWidth or codedHeight is not present in frameInit, throw a TypeError.

  2. If frameInit is not a valid VideoFrameInit, throw a TypeError.

  3. If the length of planes is incompatible with the given pixelFormat, throw a TypeError.

  4. Let frame be a new VideoFrame object.

  5. Assign false to frame’s [[detached]] internal slot.

  6. Assign init.pixelFormat to frame.pixelFormat.

  7. For each element p in planes:

    1. If p is a Plane, append a copy of p to frame.planes. Continue processing the next element.

    2. If p is a PlaneInit, append a new Plane q to frame.planes initialized as follows:

      1. Assign a copy of p.src to q’s [[plane buffer]] internal slot.

      NOTE: the samples should be copied exactly, but the user agent may add row padding as needed to improve memory alignment.

      1. Assign the width of each row in [[plane buffer]], including any padding, to q.stride.

      2. Assign p.rows to q.rows.

      3. Assign the product of (q.rows * q.stride) to q.length

  8. Assign frameInit.codedWidth to frame.codedWidth.

  9. Assign frameInit.codedHeight to frame.codedHeight.

  10. If frameInit.cropLeft is present, assign it frame.cropLeft. Otherwise, default frame.cropLeft to zero.

  11. If frameInit.cropTop is present, assign it to frame.cropTop. Otherwise, default frame.cropTop to zero.

  12. If frameInit.cropWidth is present, assign it to frame.cropWidth. Otherwise, default frame.cropWidth to frame.codedWidth.

  13. If frameInit.cropHeight is present, assign it to frame.cropHeight. Otherwise, default frame.cropHeight to frame.codedHeight.

  14. If frameInit.displayWidth is present, assign it to frame.displayWidth. Otherwise, default frame.displayWidth to frame.codedWidth.

  15. If frameInit.displayHeight is present, assign it to frame.displayHeight. Otherwise, default frame.displayHeight to frame.codedHeight.

  16. If frameInit.duration is present, assign it to frame.duration. Otherwise, default frame.duration to null.

  17. If frameInit.timestamp is present, assign it to frame.timestamp. Otherwise, default frame.timestamp to null.

  18. Return frame.

10.2.3. Attributes

timestamp, of type unsigned long long, readonly, nullable
The presentation timestamp, given in microseconds. The timestamp is copied from the EncodedVideoChunk corresponding to this VideoFrame.
duration, of type unsigned long long, readonly, nullable
The presentation duration, given in microseconds. The duration is copied from the EncodedVideoChunk corresponding to this VideoFrame.
format, of type PixelFormat, readonly
Describes the arrangement of bytes in each plane as well as the number and order of the planes.
planes, of type FrozenArray<Plane>, readonly
Holds pixel data data, laid out as described by format and Plane attributes.
codedWidth, of type unsigned long, readonly
Width of the VideoFrame in pixels, prior to any cropping or aspect ratio adjustments.
codedHeight, of type unsigned long, readonly
Height of the VideoFrame in pixels, prior to any cropping or aspect ratio adjustments.
cropLeft, of type unsigned long, readonly
The number of pixels to remove from the left of the VideoFrame, prior to aspect ratio adjustments.
cropTop, of type unsigned long, readonly
The number of pixels to remove from the top of the VideoFrame, prior to aspect ratio adjustments.
cropWidth, of type unsigned long, readonly
The width of pixels to include in the crop, starting from cropLeft.
cropHeight, of type unsigned long, readonly
The height of pixels to include in the crop, starting from cropLeft.
displayWidth, of type unsigned long, readonly
Width of the VideoFrame when displayed.
displayHeight, of type unsigned long, readonly
Height of the VideoFrame when displayed.

10.2.4. Methods

destroy() Immediately frees system resources. Destruction applies to all references, including references that are serialized and passed across Realms.

NOTE: Use clone() to create a deep copy. Cloned frames have their own lifetime and will not be affected by destroying the original frame.

When invoked, run these steps:

  1. If [[detached]] is true, throw an InvalidStateError.

  2. Remove all Planes from planes and release associated memory.

  3. Assign true to the [[detached]] internal slot.

clone() Creates a new VideoFrame with a separate lifetime containing a deep copy of this frame’s resources.

NOTE: VideoFrames may require a large amount of memory. Use clone() sparingly. Authors should take care to manage frame lifetimes by calling destroy() immediately when frames are no longer needed.

When invoked, run the following steps:

  1. If the value of the [[detached]] slot is true, return a Promise rejected with a newly created InvalidStateError.

  2. Let p be a new Promise.

  3. In parallel, resolve p with the result of running the Clone Frame algorithm with this.

  4. Return p.

createImageBitmap(options) Creates an ImageBitmap from this VideoFrame.

When invoked, run these steps:

  1. Let p be a new Promise.

  2. If either options’s resizeWidth or resizeHeight is present and is 0, then return p rejected with an InvalidStateError DOMException.

  3. If the this' [[detached]] internal slot is set to true, then return p rejected with an InvalidStateError DOMException.

  4. Let imageBitmap be a new ImageBitmap object.

  5. Set imageBitmap’s bitmap data to a copy of the VideoFrame pixel data, at the frame’s intrinsic width and intrinsic height (i.e., after any aspect-ratio correction has been applied), cropped to the source rectangle with formatting.

  6. If the origin of imageBitmap’s image is not same origin with entry settings object’s origin, then set the origin-clean flag of imageBitmap’s bitmap to false.

  7. Run this step in parallel:

  8. Resolve p with imageBitmap.

10.2.5. Algorithms

To check if a VideoDecoderConfig is a valid VideoFrameInit, run these steps:
  1. If codedWidth = 0 or codedHeight = 0, return false.

  2. If cropWidth = 0 or cropHeight = 0, return false.

  3. If cropTop + cropHeight >= codedHeight, return false.

  4. If cropLeft + cropWidth >= codedWidth, return false.

  5. If displayWidth = 0 or displayHeight = 0, return false.

  6. Return true.

10.3. Plane Interface

A Plane acts like a thin wrapper around an ArrayBuffer, but may actually be backed by a texture. Planes hide any padding before the first sample or after the last row.

A Plane is solely constructed by its VideoFrame. During construction, the User Agent may use knowledge of the frame’s PixelFormat to add padding to the Plane to improve memory alignment.

A Plane cannot be used after the VideoFrame is destroyed. A new VideoFrame can be assembled from existing Planes, and the new VideoFrame will remain valid when the original is destroyed. This makes it possible to efficiently add an alpha plane to an existing VideoFrame.

interface Plane {
  readonly attribute unsigned long stride;
  readonly attribute unsigned long rows;
  readonly attribute unsigned long length;

  undefined readInto(ArrayBufferView dst);
};

dictionary PlaneInit {
  required BufferSource src;
  required unsigned long stride;
  required unsigned long rows;
};

10.3.1. Internal Slots

[[parent frame]]
Refers to the VideoFrame that constructed and owns this plane.
[[plane buffer]]
Internal storage for the plane’s pixel data.

10.3.2. Attributes

stride, of type unsigned long, readonly
The width of each row including any padding.
rows, of type unsigned long, readonly
The number of rows.
length, of type unsigned long, readonly
The total byte length of the plane (stride * rows).

10.3.3. Methods

readInto(dst)

Copies the plane data into dst.

When invoked, run these steps:

  1. If [[parent frame]] has been destroyed, throw an InvalidStateError.

  2. If length is greater than |dst.byteLength|, throw a TypeError.

  3. Copy the [[plane buffer]] into dst.

10.4. Pixel Format

Pixel formats describe the arrangement of bytes in each plane as well as the number and order of the planes.

NOTE: This section needs work. We expect to add more pixel formats and offer much more verbose definitions. For now, please see http://www.fourcc.org/pixel-format/yuv-i420/ for a more complete description.

enum PixelFormat {
  "I420"
};

I420
Planar 4:2:0 YUV.

10.5. Algorithms

10.6. Clone Frame (with frame)

  1. Let cloneFrame be a new object of the same type as frame (either AudioFrame or VideoFrame).

  2. Initialize each attribute and internal slot of clone with a copy of the value from the corresponding attribute of this frame.

NOTE: User Agents are encouraged to avoid expensive copies of large objects (for instance, VideoFrame pixel data). Frame types are immutable, so the above step may be implemented using memory sharing techniques such as reference counting.

  1. Return cloneFrame.

11. VideoTrackReader Interface

VideoTrackReader emits VideoFrames from a MediaStreamTrack. Authors may use this interface to manipulate, render, or encode streams from getUserMedia() and getDisplayMedia().
[Exposed=Window]
interface VideoTrackReader {
  constructor(MediaStreamTrack track);

  readonly attribute VideoTrackReaderState readyState;
  attribute EventHandler onended;

  undefined start(VideoFrameOutputCallback callback);
  undefined stop();
};

enum VideoTrackReaderState {
  "started",
  "stopped",
  "ended"
};

11.1. VideoTrackReaderState Values

started
Indicates that the [[track]] is live and VideoFrames are being output to the [[callback]] provided to start().
stopped
Indicates that the [[track]] is live, but the [[callback]] is not set, so no VideoFrames are being output.
ended
Indicates that the [[track]] is ended and this object can no longer be started nor stopped.

11.2. Internal Slots

[[track]]
The MediaStreamTrack provided at construction.
[[callback]]
The VideoFrameOutputCallback assigned by the last call to start().

11.3. Constructors

VideoTrackReader(track)
  1. If track.kind is not "video", throw a TypeError.

  2. If track.readyState is "ended", throw an InvalidStateError.

  3. Let reader be a new VideoTrackReader object.

  4. Assign track to the [[track]] internal slot.

  5. Assign "stopped" to reader.readyState.

  6. Return reader.

11.4. Attributes

readyState, of type VideoTrackReaderState, readonly
Indicates the current state of the VideoTrackReader object.
onended, of type EventHandler
The event handler for the ended event.

11.5. Event Summary

ended
Dispatched when the [[track]]'s readyState becomes "ended", indicating no further VideoFrames will be output.

11.6. Methods

start(callback)

Starts calling the callback with VideoFrames from the MediaStreamTrack.

When invoked, run these steps:

  1. If readyState is not "stopped", throw an InvalidStateError.

  2. Assign "started" to readyState.

  3. Assign callback to the [[callback]] internal slot.

  4. In parallel, run the track monitor.

stop()

Stops calling the VideoFrameOutputCallback with VideoFrames from the MediaStreamTrack.

When invoked, run these steps:

  1. If readyState is not "started", throw an InvalidStateError.

  2. Cease running the track monitor.

  3. Assign "stopped" to the readyState.

  4. Assign null to the [[callback]] internal slot.

11.7. MediaStreamTrack Monitoring

The track monitor may be started and stopped by the user to control the calling of the VideoFrameOutputCallback.

Running the track monitor means monitoring [[track]] for the arrival of new picture data as well as changes to [[track]].readyState.

While [[track]].readyState is live, for each new picture that arrives in [[track]], execute the following steps:

NOTE: Pictures that arrived prior to the start of this loop are not considered.

NOTE: Video data in a MediaStreamTrack does not have a canonical binary form. The user agent should tokenize "pictures" by discrete times of capture. For example, if the source is a camera capturing 60 frames per second, the UA should construct 60 corresponding VideoFrame’s each second.

  1. Let planes be a sequence of Planes containing the picture data.

  2. Let pixelFormat be the PixelFormat of planes.

    NOTE: This section needs work. The UA should avoid early optimizations to convert between PixelFormats, but currently only a narrow set of formats is defined in this spec. We should consider adding an "opaque" format along with an API coverter API to make pixel format conversion transparent to authors.

  3. Let frameInit be an VideoFrameInit with the following keys:

    1. Let timestamp and duration be the presentation timestamp and (optionally) presentation duration as determined by the [[track]] source.

    2. Let codedWidth and codedHeight> be the width and height of the decoded video frame in pixels, prior to any cropping or aspect ratio adjustments.

    3. Let cropLeft, cropTop, cropWidth, and cropHeight be the crop region of the decoded video frame in pixels, prior to any aspect ratio adjustments.

    4. Let displayWidth and displayHeight be the display size of the decoded video frame in pixels.

  4. Let frame be a VideoFrame, constructed with pixelFormat, planes, and frameInit.

  5. Invoke [[callback]] with frame.

If [[track]].readyState becomes "ended", queue a task on the media element task source to run the following steps:

  1. Set readyState to "ended".

  2. Queue a task on the media element task source to run a simple event named ended at the VideoTrackReader.

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[MEDIA-SOURCE]
Matthew Wolenetz; et al. Media Source Extensions™. 17 November 2016. REC. URL: https://www.w3.org/TR/media-source/
[MEDIACAPTURE-STREAMS]
Cullen Jennings; et al. Media Capture and Streams. 29 September 2020. CR. URL: https://www.w3.org/TR/mediacapture-streams/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[WEBAUDIO]
Paul Adenot; Hongchan Choi. Web Audio API. 11 June 2020. CR. URL: https://www.w3.org/TR/webaudio/
[WebIDL]
Boris Zbarsky. Web IDL. 15 December 2016. ED. URL: https://heycam.github.io/webidl/

Informative References

[RFC6381]
R. Gellens; D. Singer; P. Frojdh. The 'Codecs' and 'Profiles' Parameters for "Bucket" Media Types. August 2011. Proposed Standard. URL: https://tools.ietf.org/html/rfc6381

IDL Index

[Exposed=(Window,Worker)]
interface AudioDecoder {
  constructor(AudioDecoderInit init);

  readonly attribute CodecState state;
  readonly attribute long decodeQueueSize;

  undefined configure(AudioDecoderConfig config);
  undefined decode(EncodedAudioChunk chunk);
  Promise<undefined> flush();
  undefined reset();
  undefined close();
};

dictionary AudioDecoderInit {
  required AudioFrameOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback AudioFrameOutputCallback = undefined(AudioFrame output);


[Exposed=(Window,Worker)]
interface VideoDecoder {
  constructor(VideoDecoderInit init);

  readonly attribute CodecState state;
  readonly attribute long decodeQueueSize;

  undefined configure(VideoDecoderConfig config);
  undefined decode(EncodedVideoChunk chunk);
  Promise<undefined> flush();
  undefined reset();
  undefined close();
};

dictionary VideoDecoderInit {
  required VideoFrameOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback VideoFrameOutputCallback = undefined(VideoFrame output);


[Exposed=(Window,Worker)]
interface AudioEncoder {
  constructor(AudioEncoderInit init);
  readonly attribute CodecState state;
  readonly attribute long encodeQueueSize;
  undefined configure(AudioEncoderConfig config);
  undefined encode(AudioFrame frame);
  Promise<undefined> flush();
  undefined reset();
  undefined close();
};

dictionary AudioEncoderInit {
  required EncodedAudioChunkOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback EncodedAudioChunkOutputCallback = undefined(EncodedAudioChunk output);


[Exposed=(Window,Worker)]
interface VideoEncoder {
  constructor(VideoEncoderInit init);
  readonly attribute CodecState state;
  readonly attribute long encodeQueueSize;
  undefined configure(VideoEncoderConfig config);
  undefined encode(VideoFrame frame, optional VideoEncoderEncodeOptions options = {});
  Promise<undefined> flush();
  undefined reset();
  undefined close();
};

dictionary VideoEncoderInit {
  required EncodedVideoChunkOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback EncodedVideoChunkOutputCallback = undefined(EncodedVideoChunk output);


dictionary AudioDecoderConfig {
  required DOMString codec;
  required unsigned long sampleRate;
  required unsigned long numberOfChannels;
  BufferSource description;
};


dictionary VideoDecoderConfig {
  required DOMString codec;
  BufferSource description;
  required unsigned long codedWidth;
  required unsigned long codedHeight;
  unsigned long cropLeft;
  unsigned long cropTop;
  unsigned long cropWidth;
  unsigned long cropHeight;
  unsigned long displayWidth;
  unsigned long displayHeight;
};


dictionary AudioEncoderConfig {
  required DOMString codec;
  unsigned long sampleRate;
  unsigned long numberOfChannels;
};


dictionary VideoEncoderConfig {
  required DOMString codec;
  unsigned long long bitrate;
  required unsigned long width;
  required unsigned long height;
};


dictionary VideoEncoderEncodeOptions {
  boolean keyFrame;
};


enum CodecState {
  "unconfigured",
  "configured",
  "closed"
};


callback WebCodecsErrorCallback = undefined(DOMException error);


interface EncodedAudioChunk {
  constructor(EncodedAudioChunkInit init);
  readonly attribute EncodedAudioChunkType type;
  readonly attribute unsigned long long timestamp;  // microseconds
  readonly attribute ArrayBuffer data;
};

dictionary EncodedAudioChunkInit {
  required EncodedAudioChunkType type;
  required unsigned long long timestamp;
  required BufferSource data;
};

enum EncodedAudioChunkType {
    "key",
    "delta",
};


[Exposed=(Window,Worker)]
interface EncodedVideoChunk {
  constructor(EncodedVideoChunkInit init);
  readonly attribute EncodedVideoChunkType type;
  readonly attribute unsigned long long timestamp;  // microseconds
  readonly attribute unsigned long long? duration;  // microseconds
  readonly attribute ArrayBuffer data;
};

dictionary EncodedVideoChunkInit {
  required EncodedVideoChunkType type;
  required unsigned long long timestamp;
  unsigned long long duration;
  required BufferSource data;
};

enum EncodedVideoChunkType {
    "key",
    "delta",
};


[Exposed=(Window,Worker)]
interface AudioFrame {
  constructor(AudioFrameInit init);
  readonly attribute unsigned long long timestamp;
  readonly attribute AudioBuffer? buffer;
  undefined close();
};

dictionary AudioFrameInit {
  required unsigned long long timestamp;
  required AudioBuffer buffer;
};


[Exposed=(Window,Worker)]
interface VideoFrame {
  constructor(ImageBitmap imageBitmap, VideoFrameInit frameInit);
  constructor(PixelFormat pixelFormat, sequence<(Plane or PlaneInit)> planes,
              VideoFrameInit frameInit);

  readonly attribute PixelFormat format;
  readonly attribute FrozenArray<Plane> planes;
  readonly attribute unsigned long codedWidth;
  readonly attribute unsigned long codedHeight;
  readonly attribute unsigned long cropLeft;
  readonly attribute unsigned long cropTop;
  readonly attribute unsigned long cropWidth;
  readonly attribute unsigned long cropHeight;
  readonly attribute unsigned long displayWidth;
  readonly attribute unsigned long displayHeight;
  readonly attribute unsigned long long? duration;
  readonly attribute unsigned long long? timestamp;

  undefined destroy();
  VideoFrame clone();

  Promise<ImageBitmap> createImageBitmap(
    optional ImageBitmapOptions options = {});

};

dictionary VideoFrameInit {
  unsigned long codedWidth;
  unsigned long codedHeight;
  unsigned long cropLeft;
  unsigned long cropTop;
  unsigned long cropWidth;
  unsigned long cropHeight;
  unsigned long displayWidth;
  unsigned long displayHeight;
  unsigned long long duration;
  unsigned long long timestamp;
};


interface Plane {
  readonly attribute unsigned long stride;
  readonly attribute unsigned long rows;
  readonly attribute unsigned long length;

  undefined readInto(ArrayBufferView dst);
};

dictionary PlaneInit {
  required BufferSource src;
  required unsigned long stride;
  required unsigned long rows;
};


enum PixelFormat {
  "I420"
};


[Exposed=Window]
interface VideoTrackReader {
  constructor(MediaStreamTrack track);

  readonly attribute VideoTrackReaderState readyState;
  attribute EventHandler onended;

  undefined start(VideoFrameOutputCallback callback);
  undefined stop();
};

enum VideoTrackReaderState {
  "started",
  "stopped",
  "ended"
};