WebCodecs

Draft Community Group Report,

This version:
https://wicg.github.io/web-codecs/
Issue Tracking:
GitHub
Inline In Spec
Editors:
Chris Cunningham (Google Inc.)
Paul Adenot (Mozilla)
Bernard Aboba (Microsoft Corporation)
Participate:
Git Repository.
File an issue.
Version History:
https://github.com/wicg/web-codecs/commits

Abstract

This specification defines interfaces for encoding and decoding audio and video. It also includes an interface for retrieving raw video frames from MediaStreamStracks.

Status of this document

This specification was published by the Web Platform Incubator Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

1. Definitions

Codec

Refers generically to an instance of AudioDecoder, AudioEncoder, VideoDecoder, or VideoEncoder.

Key Frame

An encoded frame that does not depend on any other frames for decoding.

Internal Pending Output

Codec outputs such as VideoFrames that currently reside in the internal pipeline of the underlying codec implementation. The underlying codec implementation may emit new outputs only when a new inputs are provided. The underlying codec implementation must emit all outputs in response to a flush.

Codec System Resources

Resources including CPU memory, GPU memory, and exclusive handles to specific decoding/encoding hardware that may be allocated by the User Agent as part of codec configuration or generation of AudioFrame and VideoFrame objects. Such resources may be quickly exhuasted and should be released immediately when no longer in use.

Advanced Video Coding (AVC)

Also known as H.264. Refers to the methods of video compression as defined by [iso14496-10].

Picture Parameter Set (PPS)

A set of parameters describing AVC coded pictures as defined by [iso14496-10].

Sequence Parameter Set (SPS)

A set of parameters describing a sequence of AVC coded video as defined by [iso14496-10].

2. Codec Processing Model

2.1. Background

This section is non-normative.

The codec interfaces defined by the specification designed such that new codec tasks may be scheduled while previous tasks are still pending. For example, web authors may call decode() without waiting for a previous decode() to complete. This is achieved by offloading underlying codec tasks to a separate thread for parallel execution.

This section describes threading behaviors as they are visible from the perspective of web authors. Implementers may choose to use more or less threads as long the exernally visible behaviors of blocking and sequencing are maintained as follows.

2.2. Control Thread and Codec Thread

All steps in this specificaiton will run on either a control thread or a codec thread.

The control thread is the thread from which authors will construct a codec and invoke its methods. Invoking a codec’s methods will typically result in the creation of control messages which are later executed on the codec thread. Each global object has a separate control thread.

The codec thread is the thread from which a codec will dequeue control messages and execute their steps. Each codec instance has a separate codec thread. The lifetime of a codec thread matches that of its associated codec instance.

The control thread uses a traditional event loop, as described in [HTML].

The codec thread uses a specialized codec processing loop.

Communication from the control thread to the codec thread is done using control message passing. Communication in the other direction is done using regular event loop tasks.

Each codec instance has a single control message queue that is a queue of control messages.

Queuing a control message means enqueing the message to a codec’s control message queue. Invoking codec methods will often queue a control message to schedule work.

Running a control message means performing a sequence of steps specified by the method that enqueued the message. The steps of a control message may depend on injected state, supplied by the method that enqueued the message.

Resetting the control message queue means performing these steps:

  1. For each control message in the control message queue:

    1. If a control message’s injected state includes a promise, reject that promise.

    2. Remove the message from the queue.

The codec processing loop must run these steps:

  1. While true:

    1. If the control message queue is emtpy, continue.

    2. Dequeue front message from the control message queue.

    3. Run control message steps described by front message.

3. AudioDecoder Interface

[Exposed=(Window,DedicatedWorker)]
interface AudioDecoder {
  constructor(AudioDecoderInit init);

  readonly attribute CodecState state;
  readonly attribute long decodeQueueSize;

  undefined configure(AudioDecoderConfig config);
  undefined decode(EncodedAudioChunk chunk);
  Promise<undefined> flush();
  undefined reset();
  undefined close();

  static Promise<AudioDecoderSupport> isConfigSupported(AudioDecoderConfig config);
};

dictionary AudioDecoderInit {
  required AudioFrameOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback AudioFrameOutputCallback = undefined(AudioFrame output);

3.1. Internal Slots

[[codec implementation]]
Underlying decoder implementation provided by the User Agent.
[[output callback]]
Callback given at construction for decoded outputs.
[[error callback]]
Callback given at construction for decode errors.

3.2. Constructors

AudioDecoder(init)
  1. Let d be a new AudioDecoder object.

  2. Assign init.output to the [[output callback]] internal slot.

  3. Assign init.error to the [[error callback]] internal slot.

  4. Assign "unconfigured" to d.state.

  5. Return d.

3.3. Attributes

state, of type CodecState, readonly
Describes the current state of the codec.
decodeQueueSize, of type long, readonly
The number of pending decode requests. This number will decrease as the underlying codec is ready to accept new input.

3.4. Methods

configure(config)
Enqueues a control message to configure the audio decoder for decoding chunks as described by config.

NOTE: Authors should first check support by calling isConfigSupported() with config to avoid error paths in the steps below.

When invoked, run these steps:

  1. If config is not a valid AudioDecoderConfig, throw a TypeError.

  2. If state is “closed”, throw an InvalidStateError.

  3. Set state to "configured".

  4. Queue a control message to configure the decoder with config.

Running a control message to configure the decoder means running these steps:

  1. Let supported be the result of running the Check Configuration Support algorith with config.

  2. If supported is true, assign [[codec implementation]] with an implementation supporting config.

  3. Otherwise, run the Close AudioDecoder algorithm with NotSupportedError.

decode(chunk)
Enqueues a control message to decode the given chunk.

When invoked, run these steps:

  1. If state is not "configured", throw an InvalidStateError.

  2. Increment decodeQueueSize.

  3. Queue a control message to decode the chunk.

Running a control message to decode the chunk means performing these steps:

  1. Attempt to use [[codec implementation]] to decode the chunk.

  2. If decoding results in an error, queue a task on the control thread event loop to run the Close VideoDecoder algorithm with EncodingError.

  3. Queue a task on the control thread event loop to decrement decodeQueueSize

  4. Let decoded outputs be a list of decoded video data outputs emitted by [[codec implementation]].

  5. If decoded outputs is not empty, queue a task on the control thread event loop to run the Output VideoFrames algorithm with decoded outputs.

flush()
Completes all control messages in the control message queue and emits all outputs.

When invoked, run these steps:

  1. If state is not "configured", return a promise rejected with InvalidStateError DOMException.

  2. Let promise be a new Promise.

  3. Queue a control message to flush the codec with promise.

  4. Return promise.

Running a control message to flush the codec means performing these steps with promise.

  1. Signal [[codec implementation]] to emit all internal pending outputs.

  2. Let decoded outputs be a list of decoded audio data outputs emitted by [[codec implementation]].

  3. If decoded outputs is not empty, queue a task on the control thread event loop to run the Output AudioFrames algorithm with decoded outputs.

  4. Queue a task on the control thread event loop to resolve promise.

reset()
Immediately resets all state including configuration, control messages in the control message queue, and all pending callbacks.

When invoked, run the Reset AudioDecoder algorithm.

close()
Immediately aborts all pending work and releases system resources. Close is final.

When invoked, run the Close AudioDecoder algorithm.

isConfigSupported(config)
Returns a promise indicating whether the provided config is supported by the user agent.

NOTE: The returned AudioDecoderSupport config will contain only the dictionary members that user agent recognized. Unrecognized dictionary memebers will be ignored. Authors may detect unrecognized dictionary members by comparinging config to their provided config.

When invoked, run these steps:

  1. If config is not a valid AudioDecoderConfig, return a promise rejected with TypeError.

  2. Let p be a new Promise.

  3. Let checkSupportQueue be the result of starting a new parallel queue.

  4. Enqueue the following steps to checkSupportQueue:

    1. Let decoderSupport be a newly constructed AudioDecoderSupport, initialized as follows:

      1. Set config to the result of running the Clone Configuration algorithm with config.

      2. Set supported to the result of running the Check Configuration Support algorithm with config.

    2. Resolve p with decoderSupport.

  5. Return p.

3.5. Algorithms

Output AudioFrames (with outputs)
Run these steps:
  1. For each output in outputs:

    1. Let buffer be an AudioBuffer containing the decoded audio data in output.

    2. Let frame be an AudioFrame containing buffer and a timestamp for the output.

    3. Invoke [[output callback]] with frame.

Reset AudioDecoder
Run these steps:
  1. If state is "closed", throw an InvalidStateError.

  2. Set state to "unconfigured".

  3. Signal [[codec implementation]] to cease producing output for the previous configuration.

  4. Reset the control message queue.

  5. Set decodeQueueSize to zero.

Close AudioDecoder (with error)
Run these steps:
  1. Run the Reset AudioDecoder algorithm.

  2. Set state to "closed".

  3. Clear [[codec implementation]] and release associated system resources.

  4. If error is set, queue a task on the control thread event loop to invoke the [[error callback]] with error.

4. VideoDecoder Interface

[Exposed=(Window,DedicatedWorker)]
interface VideoDecoder {
  constructor(VideoDecoderInit init);

  readonly attribute CodecState state;
  readonly attribute long decodeQueueSize;

  undefined configure(VideoDecoderConfig config);
  undefined decode(EncodedVideoChunk chunk);
  Promise<undefined> flush();
  undefined reset();
  undefined close();

  static Promise<VideoDecoderSupport> isConfigSupported(VideoDecoderConfig config);
};

dictionary VideoDecoderInit {
  required VideoFrameOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback VideoFrameOutputCallback = undefined(VideoFrame output);

4.1. Internal Slots

[[codec implementation]]
Underlying decoder implementation provided by the User Agent.
[[output callback]]
Callback given at construction for decoded outputs.
[[error callback]]
Callback given at construction for decode errors.

4.2. Constructors

VideoDecoder(init)
  1. Let d be a new VideoDecoder object.

  2. Assign init.output to the [[output callback]] internal slot.

  3. Assign init.error to the [[error callback]] internal slot.

  4. Assign "unconfigured" to d.state.

  5. Return d.

4.3. Attributes

state, of type CodecState, readonly
Describes the current state of the codec.
decodeQueueSize, of type long, readonly
The number of pending decode requests. This number will decrease as the underlying codec is ready to accept new input.

4.4. Methods

configure(config)
Enqueues a control message to configure the video decoder for decoding chunks as described by config.

NOTE: Authors should first check support by calling isConfigSupported() with config to avoid error paths in the steps below.

When invoked, run these steps:

  1. If config is not a valid VideoDecoderConfig, throw a TypeError.

  2. If state is “closed”, throw an InvalidStateError.

  3. Set state to "configured".

  4. Queue a control message to configure the decoder with config.

Running a control message to configure the decoder means running these steps:

  1. Let supported be the result of running the Check Configuration Support algorith with config.

  2. If supported is true, assign [[codec implementation]] with an implementation supporting config.

  3. Otherwise, run the Close VideoDecoder algorithm with NotSupportedError.

decode(chunk)
Enqueues a control message to decode the given chunk.

When invoked, run these steps:

  1. If state is not "configured", throw an InvalidStateError.

  2. Increment decodeQueueSize.

  3. Queue a control message to decode the chunk.

Running a control message to decode the chunk means performing these steps:

  1. Attempt to use [[codec implementation]] to decode the chunk.

  2. If decoding results in an error, queue a task on the control thread event loop to run the Close VideoDecoder algorithm with EncodingError.

  3. Queue a task on the control thread event loop to decrement decodeQueueSize

  4. Let decoded outputs be a list of decoded video data outputs emitted by [[codec implementation]].

  5. If decoded outputs is not empty, queue a task on the control thread event loop to run the Output VideoFrames algorithm with decoded outputs.

flush()
Completes all control messages in the control message queue and emits all outputs.

When invoked, run these steps:

  1. If state is not "configured", return a promise rejected with InvalidStateError DOMException.

  2. Let promise be a new Promise.

  3. Queue a control message to flush the codec with promise.

  4. Return promise.

Running a control message to flush the codec means performing these steps with promise.

  1. Signal [[codec implementation]] to emit all internal pending outputs.

  2. Let decoded outputs be a list of decoded video data outputs emitted by [[codec implementation]].

  3. If decoded outputs is not empty, queue a task on the control thread event loop to run the Output VideoFrames algorithm with decoded outputs.

  4. Queue a task on the control thread event loop to resolve promise.

reset()
Immediately resets all state including configuration, control messages in the control message queue, and all pending callbacks.

When invoked, run the Reset VideoDecoder algorithm.

close()
Immediately aborts all pending work and releases system resources. Close is final.

When invoked, run the Close VideoDecoder algorithm.

isConfigSupported(config)
Returns a promise indicating whether the provided config is supported by the user agent.

NOTE: The returned VideoDecoderSupport config will contain only the dictionary members that user agent recognized. Unrecognized dictionary memebers will be ignored. Authors may detect unrecognized dictionary members by comparinging config to their provided config.

When invoked, run these steps:

  1. If config is not a valid VideoDecoderConfig, return a promise rejected with TypeError.

  2. Let p be a new Promise.

  3. Let checkSupportQueue be the result of starting a new parallel queue.

  4. Enqueue the following steps to checkSupportQueue:

    1. Let decoderSupport be a newly constructed VideoDecoderSupport, initialized as follows:

      1. Set config to the result of running the Clone Configuration algorithm with config.

      2. Set supported to the result of running the Check Configuration Support algorithm with config.

    2. Resolve p with decoderSupport.

  5. Return p.

4.5. Algorithms

Output VideoFrames (with outputs)
Run these steps:
  1. For each output in outputs:

    1. Let planes be a sequence of Planes containing the decoded video frame data from output.

    2. Let pixelFormat be the PixelFormat of planes.

    3. Let frameInit be a VideoFrameInit with the following keys:

      1. Let timestamp and duration be the timestamp and duration from the EncodedVideoChunk associated with output.

      2. Let codedWidth and codedHeight be the width and height of the decoded video frame output in pixels, prior to any cropping or aspect ratio adjustments.

      3. Let cropLeft, cropTop, cropWidth, and cropHeight be the crop region of the decoded video frame output in pixels, prior to any aspect ratio adjustments.

      4. Let displayWidth and displayHeight be the display size of the decoded video frame in pixels.

    4. Let frame be a VideoFrame, constructed with pixelFormat, planes, and frameInit.

    5. Invoke [[output callback]] with frame.

Reset VideoDecoder
Run these steps:
  1. If state is "closed", throw an InvalidStateError.

  2. Set state to "unconfigured".

  3. Signal [[codec implementation]] to cease producing output for the previous configuration.

  4. Reset the control message queue.

  5. Set decodeQueueSize to zero.

Close VideoDecoder (with error)
Run these steps:
  1. Run the Reset VideoDecoder algorithm.

  2. Set state to "closed".

  3. Clear [[codec implementation]] and release associated system resources.

  4. If error is set, queue a task on the control thread event loop to invoke the [[error callback]] with error.

5. AudioEncoder Interface

[Exposed=(Window,DedicatedWorker)]
interface AudioEncoder {
  constructor(AudioEncoderInit init);

  readonly attribute CodecState state;
  readonly attribute long encodeQueueSize;

  undefined configure(AudioEncoderConfig config);
  undefined encode(AudioFrame frame);
  Promise<undefined> flush();
  undefined reset();
  undefined close();

  static Promise<AudioEncoderSupport> isConfigSupported(AudioEncoderConfig config);
};

dictionary AudioEncoderInit {
  required EncodedAudioChunkOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback EncodedAudioChunkOutputCallback = undefined(EncodedAudioChunk output);

5.1. Internal Slots

[[codec implementation]]
Underlying encoder implementation provided by the User Agent.
[[output callback]]
Callback given at construction for encoded outputs.
[[error callback]]
Callback given at construction for encode errors.

5.2. Constructors

AudioEncoder(init)
  1. Let e be a new AudioEncoder object.

  2. Assign init.output to the [[output callback]] internal slot.

  3. Assign init.error to the [[error callback]] internal slot.

  4. Assign "unconfigured" to e.state.

  5. Return e.

5.3. Attributes

state, of type CodecState, readonly
Describes the current state of the codec.
encodeQueueSize, of type long, readonly
The number of pending encode requests. This number will decrease as the underlying codec is ready to accept new input.

5.4. Methods

configure(config)
Enqueues a control message to configure the audio encoder for decoding chunks as described by config.

NOTE: Authors should first check support by calling isConfigSupported() with config to avoid error paths in the steps below.

When invoked, run these steps:

  1. If config is not a valid AudioEncoderConfig, throw a TypeError.

  2. If state is "closed", throw an InvalidStateError.

  3. Set state to "configured".

  4. Queue a control message to configure the encoder using config.

Running a control message to configure the encoder means performing these steps:

  1. Let supported be the result of running the Check Configuration Support algorith with config.

  2. If supported is true, assign [[codec implementation]] with an implementation supporting config.

  3. Otherwise, run the Close AudioEncoder algorithm with NotSupportedError.

encode(frame)
Enqueues a control message to encode the given frame.

NOTE: This method will destroy the AudioFrame. Authors who wish to retain a copy, should call frame.clone() prior to calling encode().

When invoked, run these steps:

  1. If the value of frame’s [[detached]] internal slot is true, throw a TypeError.

  2. If state is not "configured", throw an InvalidStateError.

  3. Let frameClone hold the result of running the Clone Frame algorithm with frame.

  4. Destroy the original frame by invoking frame.destroy().

  5. Increment encodeQueueSize.

  6. Queue a control message to encode frameClone.

Running a control message to encode the frame means performing these steps.

  1. Attempt to use [[codec implementation]] to encode frameClone.

  2. If encoding results in an error, queue a task on the control thread event loop to run the Close AudioEncoder algorithm with EncodingError.

  3. Queue a task on the control thread event loop to decrement encodeQueueSize.

  4. Let encoded outputs be a list of encoded audio data outputs emitted by [[codec implementation]].

  5. If encoded outputs is not empty, queue a task on the control thread event loop to run the Output EncodedAudioChunks algorithm with encoded outputs.

flush()
Completes all control messages in the control message queue and emits all outputs.

When invoked, run these steps:

  1. If state is not "configured", return a promise rejected with InvalidStateError DOMException.

  2. Let promise be a new Promise.

  3. Queue a control message to flush the codec with promise.

  4. Return promise.

Running a control message to flush the codec means performing these steps with promise.

  1. Signal [[codec implementation]] to emit all internal pending outputs.

  2. Let encoded outputs be a list of encoded audio data outputs emitted by [[codec implementation]].

  3. If encoded outputs is not empty, queue a task on the control thread event loop to run the Output EncodedAudioChunks algorithm with encoded outputs.

  4. Queue a task on the control thread event loop to resolve promise.

reset()
Immediately resets all state including configuration, control messages in the control message queue, and all pending callbacks.

When invoked, run the Reset AudioEncoder algorithm.

close()
Immediately aborts all pending work and releases system resources. Close is final.

When invoked, run the Close AudioEncoder algorithm.

isConfigSupported(config)
Returns a promise indicating whether the provided config is supported by the user agent.

NOTE: The returned AudioEncoderSupport config will contain only the dictionary members that user agent recognized. Unrecognized dictionary memebers will be ignored. Authors may detect unrecognized dictionary members by comparinging config to their provided config.

When invoked, run these steps:

  1. If config is not a valid AudioEncoderConfig, return a promise rejected with TypeError.

  2. Let p be a new Promise.

  3. Let checkSupportQueue be the result of starting a new parallel queue.

  4. Enqueue the following steps to checkSupportQueue:

    1. Let encoderSupport be a newly constructed AudioEncoderSupport, initialized as follows:

      1. Set config to the result of running the Clone Configuration algorithm with config.

      2. Set supported to the result of running the Check Configuration Support algorithm with config.

    2. Resolve p with encoderSupport.

  5. Return p.

5.5. Algorithms

Output EncodedAudioChunks (with outputs)
Run these steps:
  1. For each output in outputs:

    1. Let chunkInit be an EncodedAudioChunkInit with the following keys:

      1. Let data contain the encoded audio data from output.

      2. Let type be the EncodedAudioChunkType of output.

      3. Let timestamp be the timestamp from the AudioFrame associated with output.

    2. Let chunk be a new EncodedAudioChunk constructed with chunkInit.

    3. Invoke [[output callback]] with chunk.

Reset AudioEncoder
Run these steps:
  1. If state is "closed", throw an InvalidStateError.

  2. Set state to "unconfigured".

  3. Signal [[codec implementation]] to cease producing output for the previous configuration.

  4. Reset the control message queue.

  5. Set encodeQueueSize to zero.

Close AudioEncoder (with error)
Run these steps:
  1. Run the Reset AudioEncoder algorithm.

  2. Set state to "closed".

  3. Clear [[codec implementation]] and release associated system resources.

  4. If error is set, queue a task on the control thread event loop invoke the [[error callback]] with error.

6. VideoEncoder Interface

[Exposed=(Window,DedicatedWorker)]
interface VideoEncoder {
  constructor(VideoEncoderInit init);

  readonly attribute CodecState state;
  readonly attribute long encodeQueueSize;

  undefined configure(VideoEncoderConfig config);
  undefined encode(VideoFrame frame, optional VideoEncoderEncodeOptions options = {});
  Promise<undefined> flush();
  undefined reset();
  undefined close();

  static Promise<boolean> isConfigSupported(VideoEncoderConfig config);
};

dictionary VideoEncoderInit {
  required EncodedVideoChunkOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback EncodedVideoChunkOutputCallback = undefined(EncodedVideoChunk output, VideoDecoderConfig? output_config);

6.1. Internal Slots

[[codec implementation]]
Underlying encoder implementation provided by the User Agent.
[[output callback]]
Callback given at construction for encoded outputs.
[[error callback]]
Callback given at construction for encode errors.
[[active encoder config]]
The VideoEncoderConfig that is actively applied.
[[active output config]]
The VideoDecoderConfig that describes how to decode the most recently emitted EncodedVideoChunk.

6.2. Constructors

VideoEncoder(init)
  1. Let e be a new VideoEncoder object.

  2. Assign init.output to the [[output callback]] internal slot.

  3. Assign init.error to the [[error callback]] internal slot.

  4. Assign "unconfigured" to e.state.

  5. Return e.

6.3. Attributes

state, of type CodecState, readonly
Describes the current state of the codec.
encodeQueueSize, of type long, readonly
The number of pending encode requests. This number will decrease as the underlying codec is ready to accept new input.

6.4. Methods

configure(config)
Enqueues a control message to configure the video encoder for decoding chunks as described by config.

NOTE: Authors should first check support by calling isConfigSupported() with config to avoid error paths in the steps below.

When invoked, run these steps:

  1. If config is not a valid VideoEncoderConfig, throw a TypeError.

  2. If state is "closed", throw an InvalidStateError.

  3. Set state to "configured".

  4. Queue a control message to configure the encoder using config.

Running a control message to configure the encoder means performing these steps:

  1. Let supported be the result of running the Check Configuration Support algorith with config.

  2. If supported is true, assign [[codec implementation]] with an implementation supporting config.

  3. Otherwise, run the Close VideoEncoder algorithm with NotSupportedError and abort these steps.

  4. Set [[active encoder config]] to config.

encode(frame, options)
Enqueues a control message to encode the given frame.

NOTE: This method will destroy the VideoFrame. Authors who wish to retain a copy, should call frame.clone() prior to calling encode().

When invoked, run these steps:

  1. If the value of frame’s [[detached]] internal slot is true, throw a TypeError.

  2. If state is not "configured", throw an InvalidStateError.

  3. Let frameClone hold the result of running the Clone Frame algorithm with frame.

  4. Destroy the original frame by invoking frame.destroy().

  5. Increment encodeQueueSize.

  6. Queue a control message to encode frameClone.

Running a control message to encode the frame means performing these steps.

  1. Attempt to use [[codec implementation]] to encode frameClone according to options.

  2. If encoding results in an error, queue a task on the control thread event loop to run the Close VideoEncoder algorithm with EncodingError.

  3. Queue a task on the control thread event loop to decrement encodeQueueSize.

  4. Let encoded outputs be a list of encoded video data outputs emitted by [[codec implementation]].

  5. If encoded outputs is not empty, queue a task on the control thread event loop to run the Output EncodedVideoChunks algorithm with encoded outputs.

flush()
Completes all control messages in the control message queue and emits all outputs.

When invoked, run these steps:

  1. If state is not "configured", return a promise rejected with InvalidStateError DOMException.

  2. Let promise be a new Promise.

  3. Queue a control message to flush the codec with promise.

  4. Return promise.

Running a control message to flush the codec means performing these steps with promise.

  1. Signal [[codec implementation]] to emit all internal pending outputs.

  2. Let encoded outputs be a list of encoded video data outputs emitted by [[codec implementation]].

  3. If encoded outputs is not empty, queue a task on the control thread event loop to run the Output EncodedVideoChunks algorithm with encoded outputs.

  4. Queue a task on the control thread event loop to resolve promise.

reset()
Immediately resets all state including configuration, control messages in the control message queue, and all pending callbacks.

When invoked, run the Reset VideoEncoder algorithm.

close()
Immediately aborts all pending work and releases system resources. Close is final.

When invoked, run the Close VideoEncoder algorithm.

isConfigSupported(config)
Returns a promise indicating whether the provided config is supported by the user agent.

NOTE: The returned VideoEncoderSupport config will contain only the dictionary members that user agent recognized. Unrecognized dictionary memebers will be ignored. Authors may detect unrecognized dictionary members by comparinging config to their provided config.

When invoked, run these steps:

  1. If config is not a valid VideoEncoderConfig, return a promise rejected with TypeError.

  2. Let p be a new Promise.

  3. Let checkSupportQueue be the result of starting a new parallel queue.

  4. Enqueue the following steps to checkSupportQueue:

    1. Let encoderSupport be a newly constructed VideoEncoderSupport, initialized as follows:

      1. Set config to the result of running the Clone Configuration algorithm with config.

      2. Set supported to the result of running the Check Configuration Support algorithm with config.

    2. Resolve p with encoderSupport.

  5. Return p.

6.5. Algorithms

Output EncodedVideoChunks (with outputs)
Run these steps:
  1. For each output in outputs:

    1. Let encoder_config be the [[active encoder config]].

      The intent is for encoder_config to be the [[active encoder config]] that was used to encode output. But, as written, it may occur that output was encoded using a previous VideoEncoderConfig that has since been replaced by a later call to configure(). See #138.

    2. Let output_config be a VideoDecoderConfig that describes output. Initialize output_config as follows:

      1. Assign encoder_config.codec to output_config.codec.

      2. Assign encoder_config.cropWidth to output_config.cropWidth.

      3. Assign encoder_config.cropHeight to output_config.cropHeight.

      4. Assign encoder_config.displayWidth to output_config.displayWidth.

      5. Assign encoder_config.displayHeight to output_config.displayHeight.

      6. Assign the remaining keys of output_config as determined by [[codec implementation]]. The user agent must ensure that the configuration is completely described such that output_config could be used to correctly decode output.

        NOTE: This includes supplying the description to describe codec specific "extradata" like the avcC bytes for AVC.

    3. If output_config and [[active output config]] are equal dictionaries, set output_config to null. Otherwise, set [[active output config]] to output_config.

      NOTE: The VideoDecoderConfig output_config will be null if the configuration hasn’t changed from previous outputs. The first output will always include a non-null output_config.

    4. Let chunkInit be an EncodedVideoChunkInit with the following keys:

      1. Let data contain the encoded video data from output.

      2. Let type be the EncodedVideoChunkType of output.

      3. Let timestamp be the timestamp from the VideoFrame associated with output.

      4. Let duration be the duration from the VideoFrame associated with output.

    5. Let chunk be a new EncodedVideoChunk constructed with chunkInit.

    6. Invoke [[output callback]] with chunk.

Reset VideoEncoder
Run these steps:
  1. If state is "closed", throw an InvalidStateError.

  2. Set state to "unconfigured".

  3. Set [[active encoder config]] to null.

  4. Set [[active output config]] to null.

  5. Signal [[codec implementation]] to cease producing output for the previous configuration.

  6. Reset the control message queue.

  7. Set encodeQueueSize to zero.

Close VideoEncoder (with error)
Run these steps:
  1. Run the Reset VideoEncoder algorithm.

  2. Set state to "closed".

  3. Clear [[codec implementation]] and release associated system resources.

  4. If error is set, queue a task on the control thread event loop invoke the [[error callback]] with error.

7. Configurations

7.1. Check Configuration Support (with config)

Run these steps:
  1. If the user agent can provide a codec to support all entries of the config, including applicable default values for keys that are not included, return true.

    NOTE: The types AudioDecoderConfig, VideoDecoderConfig, AudioEncoderConfig, and VideoEncoderConfig each define their respective configuration entries and defaults.

    NOTE: Support for a given configuration may change dynamically if the hardware is altered (e.g. external GPU unplugged) or if required hardware resources are exhausted. User agents should describe support on a best-effort basis given the resources that are available at the time of the query.

  2. Otherwise, return false.

7.2. Clone Configuration (with config)

NOTE: This algorithm will copy only the dictionary members that the user agent recognizes as part of the dictionary type.

Run these steps:

  1. Let dictType be the type of dictionary config.

  2. Let clone be a new empty instance of dictType.

  3. For each dictionary member m defined on dictType:

    1. If m does not exist in config, then continue.

    2. If config[m] is a nested dictionary, set clone[m] to the result of recursively running the Clone Configuration algorithm with config[m].

    3. Otherwise, assign the value of config[m] to clone[m].

7.3. Signalling Configuration Support

7.3.1. AudioDecoderSupport

dictionary AudioDecoderSupport {
  boolean supported;
  AudioDecoderConfig config;
};

supported, of type boolean
A boolean indicating the whether the corresponding config is supported by the user agent.
config, of type AudioDecoderConfig
An AudioDecoderConfig used by the user agent in determining the value of supported.

7.3.2. VideoDecoderSupport

dictionary VideoDecoderSupport {
  boolean supported;
  VideoDecoderConfig config;
};

supported, of type boolean
A boolean indicating the whether the corresponding config is supported by the user agent.
config, of type VideoDecoderConfig
A VideoDecoderConfig used by the user agent in determining the value of supported.

7.3.3. AudioEncoderSupport

dictionary AudioEncoderSupport {
  boolean supported;
  AudioEncoderConfig config;
};

supported, of type boolean
A boolean indicating the whether the corresponding config is supported by the user agent.
config, of type AudioEncoderConfig
An AudioEncoderConfig used by the user agent in determining the value of supported.

7.3.4. VideoEncoderSupport

dictionary VideoEncoderSupport {
  boolean supported;
  VideoEncoderConfig config;
};

supported, of type boolean
A boolean indicating the whether the corresponding config is supported by the user agent.
config, of type VideoEncoderConfig
A VideoEncoderConfig used by the user agent in determining the value of supported.

7.4. Codec String

In other media specifications, codec strings historically accompanied a MIME type as the "codecs=" parameter (isTypeSupported(), canPlayType()) [RFC6381]. In this specification, encoded media is not containerized; hence, only the value of the codecs parameter is accepted.

A valid codec string must meet the following conditions.

  1. Is valid per the relevant codec specification (see examples below).

  2. It describes a single codec.

NOTE: Not a comma separated list.

  1. It is unambiguous about codec profile and level for codecs that define these concepts.

NOTE: There is no unified specification for codec strings. Each codec has its own unique string format, specified by the authors of the codec. Relevant specifications include:
Valid examples include:

Invalid examples include:

7.5. AudioDecoderConfig

dictionary AudioDecoderConfig {
  required DOMString codec;
  required unsigned long sampleRate;
  required unsigned long numberOfChannels;
  BufferSource description;
};

To check if an AudioDecoderConfig is a valid AudioDecoderConfig, run these steps:

  1. If codec is not a valid codec string, return false.

  2. Return true.

codec, of type DOMString
Contains a codec string describing the codec.
sampleRate, of type unsigned long
The number of frame samples per second.
numberOfChannels, of type unsigned long
The number of audio channels.
description, of type BufferSource
A sequence of codec specific bytes, commonly known as extradata.

NOTE: For example, the vorbis "code book".

7.6. VideoDecoderConfig

dictionary VideoDecoderConfig {
  required DOMString codec;
  BufferSource description;
  required unsigned long codedWidth;
  required unsigned long codedHeight;
  unsigned long cropLeft;
  unsigned long cropTop;
  unsigned long cropWidth;
  unsigned long cropHeight;
  unsigned long displayWidth;
  unsigned long displayHeight;
  HardwareAcceleration hardwareAcceleration = "allow";
};

To check if a VideoDecoderConfig is a valid VideoDecoderConfig, run these steps:

  1. If codec is not a valid codec string, return false.

  2. If codedWidth = 0 or codedHeight = 0, return false.

  3. If cropWidth = 0 or cropHeight = 0, return false.

  4. If cropTop + cropHeight >= codedHeight, return false.

  5. If cropLeft + cropWidth >= codedWidth, return false.

  6. If displayWidth = 0 or displayHeight = 0, return false.

  7. Return true.

codec, of type DOMString
Contains a codec string describing the codec.
description, of type BufferSource
A sequence of codec specific bytes, commonly known as extradata.

NOTE: Examples include the VP9 vpcC bytes or the AVC avcC bytes.

codedWidth, of type unsigned long
Width of the VideoFrame in pixels, prior to any cropping or aspect ratio adjustments.
codedHeight, of type unsigned long
Height of the VideoFrame in pixels, prior to any cropping or aspect ratio adjustments.
cropLeft, of type unsigned long
The number of pixels to remove from the left of the VideoFrame, prior to aspect ratio adjustments. Defaults to zero if not present.
cropTop, of type unsigned long
The number of pixels to remove from the top of the VideoFrame, prior to aspect ratio adjustments. Defaults to zero if not present.
cropWidth, of type unsigned long
The width in pixels to include in the crop, starting from cropLeft. Defaults to codedWidth if not present.
cropHeight, of type unsigned long
The height in pixels to include in the crop, starting from cropLeft. Defaults to codedHeight if not present.
displayWidth, of type unsigned long
Width of the VideoFrame when displayed. Defaults to cropWidth if not present.
displayHeight, of type unsigned long
Height of the VideoFrame when displayed. Defaults to cropHeight if not present.
hardwareAcceleration, of type HardwareAcceleration, defaulting to "allow"
Configures hardware acceleration for this codec. See HardwareAcceleration.

7.7. AudioEncoderConfig

dictionary AudioEncoderConfig {
  required DOMString codec;
  unsigned long sampleRate;
  unsigned long numberOfChannels;
};

To check if an AudioEncoderConfig is a valid AudioEncoderConfig, run these steps:

  1. If codec is not a valid codec string, return false.

  2. Return true.

codec, of type DOMString
Contains a codec string describing the codec.
sampleRate, of type unsigned long
The number of frame samples per second.
numberOfChannels, of type unsigned long
The number of audio channels.

7.8. VideoEncoderConfig

dictionary VideoEncoderConfig {
  required DOMString codec;
  unsigned long long bitrate;
  required unsigned long cropWidth;
  required unsigned long cropHeight;
  unsigned long displayWidth;
  unsigned long displayHeight;
  HardwareAcceleration hardwareAcceleration = "allow";

  AvcEncoderConfig avc;
};

To check if a VideoEncoderConfig is a valid VideoEncoderConfig, run these steps:

  1. If codec is not a valid codec string, return false.

  2. If cropWidth = 0 or cropHeight = 0, return false.

  3. If displayWidth = 0 or displayHeight = 0, return false.

  4. If avc is present, but codec does not describe the AVC codec, per [RFC6381], return false.

  5. Return true.

codec, of type DOMString
Contains a codec string describing the codec.
bitrate, of type unsigned long long
The average bitrate of the encoded video given in units of bits per second.
cropWidth, of type unsigned long
The encoded width of output EncodedVideoChunks in pixels, prior to any display aspect ratio adjustments.

The encoder must scale any VideoFrame who’s cropWidth differs from this value.

cropHeight, of type unsigned long
The encoded height of output EncodedVideoChunks in pixels, prior to any display aspect ratio adjustments.

The encoder must scale any VideoFrame who’s cropHeight differs from this value.

NOTE: The VideoEncoder will scale, not crop, input VideoFrames to the specified cropWidth and cropHeight.

The naming of cropWidth and cropHeight reflects that the encoded chunks may be decoded to produce VideoFrames who’s cropWidth and cropHeight will match these values.

displayWidth, of type unsigned long
The intended display width of output EncodedVideoChunks in pixels. Defaults to cropWidth if not present.
displayHeight, of type unsigned long
The intended display height of output EncodedVideoChunks in pixels. Defaults to cropWidth if not present.
NOTE: Providing a displayWidth or displayHeight that differs from the crop dimensions signals that chunks should be scaled after decoding to arrive at the final display aspect ratio.

For many codecs this is merely pass-through information, but some codecs may optionally include display sizing in the bitstream.

avc, of type AvcEncoderConfig
Contains codec specific configuration options for the AVC (H.264) codec.
hardwareAcceleration, of type HardwareAcceleration, defaulting to "allow"
Configures hardware acceleration for this codec. See HardwareAcceleration.

7.8.1. AvcEncoderConfig

dictionary AvcEncoderConfig {
  AvcBitstreamFormat format = "avc";
};

format, of type AvcBitstreamFormat, defaulting to "avc"
Configures the format of output AVC EncodedVideoChunks. See AvcBitstreamFormat.
7.8.1.1. AvcBitstreamFormat
enum AvcBitstreamFormat {
  "annexb",
  "avc",
};

The AvcBitstreamFormat determines the location of Sequence Parameter Set (SPS) and Picture Parameter Set (PPS) data, and mechanisms for packaging the bitstream.

annexb
This format is as described by [iso14496-10], Annex B. Notably, SPS and PPS data are included periodically throughout the bitstream.

NOTE: This format is commonly used in live-streaming applications, where including the SPS and PPS data periodically allows users to easily start from the middle of the stream.

avc
This format is as described by [iso14496-15], Section 5. Notably, SPS and PPS data are not included in the bitstream and are instead passed out of band as a AVCDecoderConfigurationRecord as defined by [iso14496-15]. This structure is emitted via the [[output callback]] as the description of the VideoDecoderConfig output_config.

NOTE: This format is commonly used in .MP4 files, where the player generally has random access to the media data.

7.9. Hardware Acceleration

enum HardwareAcceleration {
  "allow",
  "deny",
  "require",
};

When supported, hardware acceleration offloads encoding or decoding to specialized hardware.

NOTE: Most authors will be best served by using the default of allow. This gives the user agent flexibility to optimize based on its knowledge of the system and configuration. A common strategy will be to prioritize hardware acceleration at higher resolutions with a fallback to software codecs if hardware acceleration fails.

Authors should carefully weigh the tradeoffs setting a hardware acceleration preference. The precise trade-offs will be device-specific, but authors should generally expect the following:

Given these tradeoffs, a good example of using "require" would be if an author intends to provide their own software based fallback via WebAssembly.

Alternatively, a good example of using "disallow" would be if an author is especially sensitive to the higher startup latency or decreased robustness generally associated with hardware acceleration.

allow
Indicates that the user agent may use hardware acceleration if it is available and compatible with other aspects of the codec configuration.
deny
Indicates that the user agent must not use hardware acceleration.

NOTE: This will cause the configuration to be unsupported on platforms where an unaccelerated codec is unavailable or is incompatible with other aspects of the codec configuration.

require
Indicates that the user agent must use hardware acceleration.

NOTE: This will cause the configuration to be unsupported on platforms where an accelerated codec is unavailable or is incompatible with other aspects of the codec configuration.

7.10. Configuration Equivalence

Two dictionaries are equal dictionaries if they contain the same keys and values. For nested dictionaries, apply this definition recursively.

7.11. VideoEncoderEncodeOptions

dictionary VideoEncoderEncodeOptions {
  boolean keyFrame = false;
};

keyFrame, of type boolean, defaulting to false
A value of true indicates that the given frame MUST be encoded as a key frame. A value of false indicates that the user agent has flexibility to decide whether the frame will be encoded as a key frame.

7.12. CodecState

enum CodecState {
  "unconfigured",
  "configured",
  "closed"
};

unconfigured
The codec is not configured for encoding or decoding.
configured
A valid configuration has been provided. The codec is ready for encoding or decoding.
closed
The codec is no longer usable and underlying system resources have been released.

7.13. WebCodecsErrorCallback

callback WebCodecsErrorCallback = undefined(DOMException error);

8. Encoded Media Interfaces (Chunks)

These interfaces represent chunks of encoded media.

8.1. EncodedAudioChunk Interface

[Exposed=(Window,DedicatedWorker)]
interface EncodedAudioChunk {
  constructor(EncodedAudioChunkInit init);
  readonly attribute EncodedAudioChunkType type;
  readonly attribute unsigned long long timestamp;  // microseconds
  readonly attribute ArrayBuffer data;
};

dictionary EncodedAudioChunkInit {
  required EncodedAudioChunkType type;
  required unsigned long long timestamp;
  required BufferSource data;
};

enum EncodedAudioChunkType {
    "key",
    "delta",
};

8.1.1. Constructors

EncodedAudioChunk(init)
  1. Let chunk be a new EncodedAudioChunk object, initialized as follows

    1. Assign init.type to chunk.type.

    2. Assign init.timestamp to chunk.timestamp.

    3. Assign a copy of init.data to chunk.data.

  2. Return chunk.

8.1.2. Attributes

type, of type EncodedAudioChunkType, readonly
Describes whether the chunk is a key frame.
timestamp, of type unsigned long long, readonly
The presentation timestamp, given in microseconds.
data, of type ArrayBuffer, readonly
A sequence of bytes containing encoded audio data.

8.2. EncodedVideoChunk Interface

[Exposed=(Window,DedicatedWorker)]
interface EncodedVideoChunk {
  constructor(EncodedVideoChunkInit init);
  readonly attribute EncodedVideoChunkType type;
  readonly attribute unsigned long long timestamp;  // microseconds
  readonly attribute unsigned long long? duration;  // microseconds
  readonly attribute ArrayBuffer data;
};

dictionary EncodedVideoChunkInit {
  required EncodedVideoChunkType type;
  required unsigned long long timestamp;
  unsigned long long duration;
  required BufferSource data;
};

enum EncodedVideoChunkType {
    "key",
    "delta",
};

8.2.1. Constructors

EncodedVideoChunk(init)
  1. Let chunk be a new EncodedVideoChunk object, initialized as follows

    1. Assign init.type to chunk.type.

    2. Assign init.timestamp to chunk.timestamp.

    3. If duration is present in init, assign init.duration to chunk.duration. Otherwise, assign null to chunk.duration.

  2. Assign a copy of init.data to chunk.data.

  3. Return chunk.

8.2.2. Attributes

type, of type EncodedVideoChunkType, readonly
Describes whether the chunk is a key frame or not.
timestamp, of type unsigned long long, readonly
The presentation timestamp, given in microseconds.
duration, of type unsigned long long, readonly, nullable
The presentation duration, given in microseconds.
data, of type ArrayBuffer, readonly
A sequence of bytes containing encoded video data.

9. Raw Media Interfaces (Frames)

These interfaces represent unencoded (raw) media.

9.1. AudioFrame Interface

[Exposed=(Window,DedicatedWorker)]
interface AudioFrame {
  constructor(AudioFrameInit init);
  readonly attribute unsigned long long timestamp;
  readonly attribute AudioBuffer? buffer;
  undefined close();
};

dictionary AudioFrameInit {
  required unsigned long long timestamp;
  required AudioBuffer buffer;
};

9.1.1. Internal Slots

[[detached]]
Boolean indicating whether close() was invoked and underlying resources have been released.

9.1.2. Constructors

AudioFrame(init)
  1. Let frame be a new AudioFrame object.

  2. Assign init.timestamp to frame.timestamp.

  3. Assign init.buffer to frame.buffer.

  4. Assign false to the [[detached]] internal slot.

  5. Return frame.

9.1.3. Attributes

timestamp, of type unsigned long long, readonly
The presentation timestamp, given in microseconds.
buffer, of type AudioBuffer, readonly, nullable
The buffer containing decoded audio data.

9.1.4. Methods

close()
Immediately frees system resources. When invoked, run these steps:
  1. Release system resources for buffer and set its value to null.

  2. Assign true to the [[detached]] internal slot.

NOTE: This section needs work. We should use the name and semantics of VideoFrame destroy(). Similarly, we should add clone() to make a deep copy.

9.2. VideoFrame Interface

[Exposed=(Window,DedicatedWorker)]
interface VideoFrame {
  constructor(ImageBitmap imageBitmap, optional VideoFrameInit frameInit = {});
  constructor(PixelFormat pixelFormat, sequence<(Plane or PlaneInit)> planes,
              optional VideoFrameInit frameInit = {});

  readonly attribute PixelFormat format;
  readonly attribute FrozenArray<Plane> planes;
  readonly attribute unsigned long codedWidth;
  readonly attribute unsigned long codedHeight;
  readonly attribute unsigned long cropLeft;
  readonly attribute unsigned long cropTop;
  readonly attribute unsigned long cropWidth;
  readonly attribute unsigned long cropHeight;
  readonly attribute unsigned long displayWidth;
  readonly attribute unsigned long displayHeight;
  readonly attribute unsigned long long? duration;
  readonly attribute unsigned long long? timestamp;

  undefined destroy();
  VideoFrame clone();

  Promise<ImageBitmap> createImageBitmap(
    optional ImageBitmapOptions options = {});

};

dictionary VideoFrameInit {
  unsigned long codedWidth;
  unsigned long codedHeight;
  unsigned long cropLeft;
  unsigned long cropTop;
  unsigned long cropWidth;
  unsigned long cropHeight;
  unsigned long displayWidth;
  unsigned long displayHeight;
  unsigned long long duration;
  unsigned long long timestamp;
};

9.2.1. Internal Slots

[[detached]]
Boolean indicating whether destroy() was invoked and underlying resources have been released.

9.2.2. Constructors

NOTE: this section needs work. Current wording assumes a VideoFrame can always be easily represented using one of the known pixel formats. In practice, the underlying UA resources may be GPU backed or formatted in such a way that conversion to an allowed pixel format requires expensive copies and translation. When this occurs, we should allow planes to be null and format to be "opaque" to avoid early optimization. We should make conversion explicit and user controlled by offering a videoFrame.convertTo(format) that returns a Promise containing a new VideoFrame for which the copies/translations are performed.

VideoFrame(imageBitmap, frameInit)

  1. If frameInit is not a valid VideoFrameInit, throw a TypeError.

  2. If the value of imageBitmap’s' [[Detached]] internal slot is set to true, then throw an InvalidStateError DOMException.

  3. Let frame be a new VideoFrame.

  4. Assign false to frame’s [[detached]] internal slot.

  5. Use a copy of the pixel data in imageBitmap to initialize to following frame attributes:

    1. Initialize frame.pixelFormat be the underlying format of imageBitmap.

    2. Initialize frame.planes to describe the arrangement of memory of the copied pixel data.

    3. Assign regions of the copied pixel data to the [[plane buffer]] internal slot of each plane as appropriate for the pixel format.

    4. Initialize frame.codedWidth and frame.codedHeight describe the width and height of the imageBitamp prior to any cropping or aspect ratio adjustments.

  6. Use frameInit to initialize the remaining frame attributes:

    1. If frameInit.cropLeft is present, initialize it frame.cropLeft. Otherwise, default frame.cropLeft to zero.

    2. If frameInit.cropTop is present, initialize it to frame.cropTop. Otherwise, default frame.cropTop to zero.

    3. If frameInit.cropWidth is present, initialize it to frame.cropWidth. Otherwise, default frame.cropWidth to frame.codedWidth.

    4. If frameInit.cropHeight is present, initialize it to frame.cropHeight. Otherwise, default frame.cropHeight to frame.codedHeight.

    5. If frameInit.displayWidth is present, initialize it to frame.displayWidth. Otherwise, default frame.displayWidth to frame.codedWidth.

    6. If frameInit.displayHeight is present, initialize it to frame.displayHeight. Otherwise, default frame.displayHeight to frame.codedHeight.

    7. If frameInit.duration is present, initialize it to frame.duration. Otherwise, default frame.duration to null.

    8. If frameInit.timestamp is present, initialize it to frame.timestamp. Otherwise default frame.timestamp to null.

  7. Return frame.

VideoFrame(pixelFormat, planes, frameInit)

  1. If either codedWidth or codedHeight is not present in frameInit, throw a TypeError.

  2. If frameInit is not a valid VideoFrameInit, throw a TypeError.

  3. If the length of planes is incompatible with the given pixelFormat, throw a TypeError.

  4. Let frame be a new VideoFrame object.

  5. Assign false to frame’s [[detached]] internal slot.

  6. Assign init.pixelFormat to frame.pixelFormat.

  7. For each element p in planes:

    1. If p is a Plane, append a copy of p to frame.planes. Continue processing the next element.

    2. If p is a PlaneInit, append a new Plane q to frame.planes initialized as follows:

      1. Assign a copy of p.src to q’s [[plane buffer]] internal slot.

      NOTE: the samples should be copied exactly, but the user agent may add row padding as needed to improve memory alignment.

      1. Assign the width of each row in [[plane buffer]], including any padding, to q.stride.

      2. Assign p.rows to q.rows.

      3. Assign the product of (q.rows * q.stride) to q.length

  8. Assign frameInit.codedWidth to frame.codedWidth.

  9. Assign frameInit.codedHeight to frame.codedHeight.

  10. If frameInit.cropLeft is present, assign it frame.cropLeft. Otherwise, default frame.cropLeft to zero.

  11. If frameInit.cropTop is present, assign it to frame.cropTop. Otherwise, default frame.cropTop to zero.

  12. If frameInit.cropWidth is present, assign it to frame.cropWidth. Otherwise, default frame.cropWidth to frame.codedWidth.

  13. If frameInit.cropHeight is present, assign it to frame.cropHeight. Otherwise, default frame.cropHeight to frame.codedHeight.

  14. If frameInit.displayWidth is present, assign it to frame.displayWidth. Otherwise, default frame.displayWidth to frame.codedWidth.

  15. If frameInit.displayHeight is present, assign it to frame.displayHeight. Otherwise, default frame.displayHeight to frame.codedHeight.

  16. If frameInit.duration is present, assign it to frame.duration. Otherwise, default frame.duration to null.

  17. If frameInit.timestamp is present, assign it to frame.timestamp. Otherwise, default frame.timestamp to null.

  18. Return frame.

9.2.3. Attributes

timestamp, of type unsigned long long, readonly, nullable
The presentation timestamp, given in microseconds. The timestamp is copied from the EncodedVideoChunk corresponding to this VideoFrame.
duration, of type unsigned long long, readonly, nullable
The presentation duration, given in microseconds. The duration is copied from the EncodedVideoChunk corresponding to this VideoFrame.
format, of type PixelFormat, readonly
Describes the arrangement of bytes in each plane as well as the number and order of the planes.
planes, of type FrozenArray<Plane>, readonly
Holds pixel data data, laid out as described by format and Plane attributes.
codedWidth, of type unsigned long, readonly
Width of the VideoFrame in pixels, prior to any cropping or aspect ratio adjustments.
codedHeight, of type unsigned long, readonly
Height of the VideoFrame in pixels, prior to any cropping or aspect ratio adjustments.
cropLeft, of type unsigned long, readonly
The number of pixels to remove from the left of the VideoFrame, prior to aspect ratio adjustments.
cropTop, of type unsigned long, readonly
The number of pixels to remove from the top of the VideoFrame, prior to aspect ratio adjustments.
cropWidth, of type unsigned long, readonly
The width of pixels to include in the crop, starting from cropLeft.
cropHeight, of type unsigned long, readonly
The height of pixels to include in the crop, starting from cropLeft.
displayWidth, of type unsigned long, readonly
Width of the VideoFrame when displayed.
displayHeight, of type unsigned long, readonly
Height of the VideoFrame when displayed.

9.2.4. Methods

destroy() Immediately frees system resources. Destruction applies to all references, including references that are serialized and passed across Realms.

NOTE: Authors should take care to manage frame lifetimes by calling destroy() immediately when frames are no longer needed.

NOTE: Use clone() to create a deep copy. Cloned frames have their own lifetime and will not be affected by destroying the original frame.

When invoked, run these steps:

  1. If [[detached]] is true, throw an InvalidStateError.

  2. Remove all Planes from planes and release associated memory.

  3. Assign true to the [[detached]] internal slot.

clone() Creates a new VideoFrame with a separate lifetime containing a deep copy of this frame’s resources.

NOTE: VideoFrames may require a large amount of memory. Use clone() sparingly.

When invoked, run the following steps:

  1. If the value of the [[detached]] slot is true, return a promise rejected with InvalidStateError DOMException.

  2. Let p be a new Promise.

  3. In parallel, resolve p with the result of running the Clone Frame algorithm with this.

  4. Return p.

createImageBitmap(options) Creates an ImageBitmap from this VideoFrame.

When invoked, run these steps:

  1. Let p be a new Promise.

  2. If either options’s resizeWidth or resizeHeight is present and is 0, then return p rejected with an InvalidStateError DOMException.

  3. If the this' [[detached]] internal slot is set to true, then return p rejected with an InvalidStateError DOMException.

  4. Let imageBitmap be a new ImageBitmap object.

  5. Set imageBitmap’s bitmap data to a copy of the VideoFrame pixel data, at the frame’s intrinsic width and intrinsic height (i.e., after any aspect-ratio correction has been applied), cropped to the source rectangle with formatting.

  6. If the origin of imageBitmap’s image is not same origin with entry settings object’s origin, then set the origin-clean flag of imageBitmap’s bitmap to false.

  7. Run this step in parallel:

  8. Resolve p with imageBitmap.

9.2.5. Algorithms

To check if a VideoFrameInit is a valid VideoFrameInit, run these steps:
  1. If codedWidth = 0 or codedHeight = 0, return false.

  2. If cropWidth = 0 or cropHeight = 0, return false.

  3. If cropTop + cropHeight >= codedHeight, return false.

  4. If cropLeft + cropWidth >= codedWidth, return false.

  5. If displayWidth = 0 or displayHeight = 0, return false.

  6. Return true.

9.3. Plane Interface

A Plane acts like a thin wrapper around an ArrayBuffer, but may actually be backed by a texture. Planes hide any padding before the first sample or after the last row.

A Plane is solely constructed by its VideoFrame. During construction, the User Agent may use knowledge of the frame’s PixelFormat to add padding to the Plane to improve memory alignment.

A Plane cannot be used after the VideoFrame is destroyed. A new VideoFrame can be assembled from existing Planes, and the new VideoFrame will remain valid when the original is destroyed. This makes it possible to efficiently add an alpha plane to an existing VideoFrame.

interface Plane {
  readonly attribute unsigned long stride;
  readonly attribute unsigned long rows;
  readonly attribute unsigned long length;

  undefined readInto(ArrayBufferView dst);
};

dictionary PlaneInit {
  required BufferSource src;
  required unsigned long stride;
  required unsigned long rows;
};

9.3.1. Internal Slots

[[parent frame]]
Refers to the VideoFrame that constructed and owns this plane.
[[plane buffer]]
Internal storage for the plane’s pixel data.

9.3.2. Attributes

stride, of type unsigned long, readonly
The width of each row including any padding.
rows, of type unsigned long, readonly
The number of rows.
length, of type unsigned long, readonly
The total byte length of the plane (stride * rows).

9.3.3. Methods

readInto(dst)

Copies the plane data into dst.

When invoked, run these steps:

  1. If [[parent frame]] has been destroyed, throw an InvalidStateError.

  2. If length is greater than |dst.byteLength|, throw a TypeError.

  3. Copy the [[plane buffer]] into dst.

9.4. Pixel Format

Pixel formats describe the arrangement of bytes in each plane as well as the number and order of the planes.

NOTE: This section needs work. We expect to add more pixel formats and offer much more verbose definitions. For now, please see http://www.fourcc.org/pixel-format/yuv-i420/ for a more complete description.

enum PixelFormat {
  "I420"
};

I420
Planar 4:2:0 YUV.

9.5. Algorithms

Clone Frame (with frame)
  1. Let cloneFrame be a new object of the same type as frame (either AudioFrame or VideoFrame).

  2. Initialize each attribute and internal slot of clone with a copy of the value from the corresponding attribute of this frame.

    NOTE: User Agents are encouraged to avoid expensive copies of large objects (for instance, VideoFrame pixel data). Frame types are immutable, so the above step may be implemented using memory sharing techniques such as reference counting.

  3. Return cloneFrame.

10. Security Considerations

The primary security impact is that features of this API make it easier for an attacker to exploit vulnerabilities in the underlying platform codecs. Additionally, new abilities to configure and control the codecs may allow for new exploits that rely on a specific configuration and/or sequence of control operations.

Platform codecs are historically an internal detail of APIs like HTMLMediaElement, [WebAudio], and [WebRTC]. In this way, it has always been possible to attack the underlying codecs by using malformed media files/streams and invoking the various API control methods.

For example, you can send any stream to a decoder by first wrapping that stream in a media container (e.g. mp4) and setting that as the src of an HTMLMediaElement. You can then cause the underlying video decoder to be reset() by setting a new value for <video>.currentTime.

WebCodecs makes such attacks easier by exposing low level control when inputs are provided and direct access to invoke the codec control methods. This also affords attackers the ability to invoke sequences of control methods that were not previously possible via the higher level APIs.

User agents should mitigate this risk by extensively fuzzing their implementation with random inputs and control method invocations. Additionally, user agents are encouraged to isolate their underlying codecs in processes with restricted privileges (sandbox) as a barrier against successful exploits being able to read user data.

An additional concern is exposing the underlying codecs to input mutation race conditions. Specifically, it should not be possible for a site to mutate a codec input or output while the underlying codec may still be operating on that data. This concern is mitigated by ensuring that input and output interfaces are immutable.

EncodedVideoChunk and EncodedAudioChunk currently expose a mutable data. See #80.

11. Privacy Considerations

The primary privacy impact is an increased ability to fingerprint users by querying for different codec capabilities to establish a codec feature profile. Much of this profile is already exposed by existing APIs. Such profiles are very unlikely to be uniquely identifying, but may be used with other metrics to create a fingerprint.

An attacker may accumulate a codec feature profile by calling IsConfigSupported() methods with a number of different configuration dictionaries. Similarly, an attacker may attempt to configure() a codec with different configuration dictionaries and observe which configurations are accepted.

Attackers may also use existing APIs to establish much of the codec feature profile. For example, the [media-capabilities] decodingInfo() API describes what types of decoders are supported and its powerEfficient attribute may signal when a decoder uses hardware acceleration. Similarly, the [WebRTC] getCapabilities() API may be used to determine what types of encoders are supported and the getStats() API may be used to determine when an encoder uses hardware acceleration. WebCodecs will expose some additional information in the form of low level codec features.

A codec feature profile alone is unlikely to be uniquely identifying. Underlying codecs are often implemented entirely in software (be it part of the user agent binary or part of the operating system), such that all users who run that software will have a common set capabilities. Additionally, underlying codecs are often implemented with hardware acceleration, but such hardware is mass produced and devices of a particular class and manufacture date (e.g. flagship phones manufactured in 2020) will often have common capabilities. There will be outliers (some users may run outdated versions of software codecs or use a rare mix of custom assembled hardware), but most of the time a given codec feature profile is shared by a large group of users.

Segmenting groups of users by codec feature profile still amounts to a bit of entropy that can be combined with other metrics to uniquely identify a user. User agents may partially mitigate this by returning an error whenever a site attempts to exhaustively probe for codec capabilities. Additionally, user agents may implement a "privacy budget", which depletes as authors use WebCodecs and other identifying APIs. Upon exhaustion of the privacy budget, codec capabilities could be reduced to a common baseline or prompt for user approval.

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Conformant Algorithms

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[ISO14496-10]
Information technology — Coding of audio-visual objects — Part 10: Advanced video coding. December 2020. Published. URL: https://www.iso.org/standard/75400.html
[ISO14496-15]
Information technology — Coding of audio-visual objects — Part 15: Carriage of network abstraction layer (NAL) unit structured video in the ISO base media file format. September 2019. Published. URL: https://www.iso.org/standard/74429.html
[MEDIA-CAPABILITIES]
Mounir Lamouri; Chris Cunningham; Vi Nguyen. Media Capabilities. 30 January 2020. WD. URL: https://www.w3.org/TR/media-capabilities/
[MEDIA-SOURCE]
Matthew Wolenetz; et al. Media Source Extensions™. 17 November 2016. REC. URL: https://www.w3.org/TR/media-source/
[MIMESNIFF]
Gordon P. Hemsley. MIME Sniffing Standard. Living Standard. URL: https://mimesniff.spec.whatwg.org/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[WebAudio]
Paul Adenot; Hongchan Choi. Web Audio API. 14 January 2021. CR. URL: https://www.w3.org/TR/webaudio/
[WebIDL]
Boris Zbarsky. Web IDL. 15 December 2016. ED. URL: https://heycam.github.io/webidl/
[WebRTC]
Cullen Jennings; Henrik Boström; Jan-Ivar Bruaroey. WebRTC 1.0: Real-Time Communication Between Browsers. 26 January 2021. REC. URL: https://www.w3.org/TR/webrtc/

Informative References

[RFC6381]
R. Gellens; D. Singer; P. Frojdh. The 'Codecs' and 'Profiles' Parameters for "Bucket" Media Types. August 2011. Proposed Standard. URL: https://tools.ietf.org/html/rfc6381

IDL Index

[Exposed=(Window,DedicatedWorker)]
interface AudioDecoder {
  constructor(AudioDecoderInit init);

  readonly attribute CodecState state;
  readonly attribute long decodeQueueSize;

  undefined configure(AudioDecoderConfig config);
  undefined decode(EncodedAudioChunk chunk);
  Promise<undefined> flush();
  undefined reset();
  undefined close();

  static Promise<AudioDecoderSupport> isConfigSupported(AudioDecoderConfig config);
};

dictionary AudioDecoderInit {
  required AudioFrameOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback AudioFrameOutputCallback = undefined(AudioFrame output);


[Exposed=(Window,DedicatedWorker)]
interface VideoDecoder {
  constructor(VideoDecoderInit init);

  readonly attribute CodecState state;
  readonly attribute long decodeQueueSize;

  undefined configure(VideoDecoderConfig config);
  undefined decode(EncodedVideoChunk chunk);
  Promise<undefined> flush();
  undefined reset();
  undefined close();

  static Promise<VideoDecoderSupport> isConfigSupported(VideoDecoderConfig config);
};

dictionary VideoDecoderInit {
  required VideoFrameOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback VideoFrameOutputCallback = undefined(VideoFrame output);


[Exposed=(Window,DedicatedWorker)]
interface AudioEncoder {
  constructor(AudioEncoderInit init);

  readonly attribute CodecState state;
  readonly attribute long encodeQueueSize;

  undefined configure(AudioEncoderConfig config);
  undefined encode(AudioFrame frame);
  Promise<undefined> flush();
  undefined reset();
  undefined close();

  static Promise<AudioEncoderSupport> isConfigSupported(AudioEncoderConfig config);
};

dictionary AudioEncoderInit {
  required EncodedAudioChunkOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback EncodedAudioChunkOutputCallback = undefined(EncodedAudioChunk output);


[Exposed=(Window,DedicatedWorker)]
interface VideoEncoder {
  constructor(VideoEncoderInit init);

  readonly attribute CodecState state;
  readonly attribute long encodeQueueSize;

  undefined configure(VideoEncoderConfig config);
  undefined encode(VideoFrame frame, optional VideoEncoderEncodeOptions options = {});
  Promise<undefined> flush();
  undefined reset();
  undefined close();

  static Promise<boolean> isConfigSupported(VideoEncoderConfig config);
};

dictionary VideoEncoderInit {
  required EncodedVideoChunkOutputCallback output;
  required WebCodecsErrorCallback error;
};

callback EncodedVideoChunkOutputCallback = undefined(EncodedVideoChunk output, VideoDecoderConfig? output_config);


dictionary AudioDecoderSupport {
  boolean supported;
  AudioDecoderConfig config;
};


dictionary VideoDecoderSupport {
  boolean supported;
  VideoDecoderConfig config;
};


dictionary AudioEncoderSupport {
  boolean supported;
  AudioEncoderConfig config;
};


dictionary VideoEncoderSupport {
  boolean supported;
  VideoEncoderConfig config;
};


dictionary AudioDecoderConfig {
  required DOMString codec;
  required unsigned long sampleRate;
  required unsigned long numberOfChannels;
  BufferSource description;
};


dictionary VideoDecoderConfig {
  required DOMString codec;
  BufferSource description;
  required unsigned long codedWidth;
  required unsigned long codedHeight;
  unsigned long cropLeft;
  unsigned long cropTop;
  unsigned long cropWidth;
  unsigned long cropHeight;
  unsigned long displayWidth;
  unsigned long displayHeight;
  HardwareAcceleration hardwareAcceleration = "allow";
};


dictionary AudioEncoderConfig {
  required DOMString codec;
  unsigned long sampleRate;
  unsigned long numberOfChannels;
};


dictionary VideoEncoderConfig {
  required DOMString codec;
  unsigned long long bitrate;
  required unsigned long cropWidth;
  required unsigned long cropHeight;
  unsigned long displayWidth;
  unsigned long displayHeight;
  HardwareAcceleration hardwareAcceleration = "allow";

  AvcEncoderConfig avc;
};


dictionary AvcEncoderConfig {
  AvcBitstreamFormat format = "avc";
};


enum AvcBitstreamFormat {
  "annexb",
  "avc",
};


enum HardwareAcceleration {
  "allow",
  "deny",
  "require",
};


dictionary VideoEncoderEncodeOptions {
  boolean keyFrame = false;
};


enum CodecState {
  "unconfigured",
  "configured",
  "closed"
};


callback WebCodecsErrorCallback = undefined(DOMException error);


[Exposed=(Window,DedicatedWorker)]
interface EncodedAudioChunk {
  constructor(EncodedAudioChunkInit init);
  readonly attribute EncodedAudioChunkType type;
  readonly attribute unsigned long long timestamp;  // microseconds
  readonly attribute ArrayBuffer data;
};

dictionary EncodedAudioChunkInit {
  required EncodedAudioChunkType type;
  required unsigned long long timestamp;
  required BufferSource data;
};

enum EncodedAudioChunkType {
    "key",
    "delta",
};


[Exposed=(Window,DedicatedWorker)]
interface EncodedVideoChunk {
  constructor(EncodedVideoChunkInit init);
  readonly attribute EncodedVideoChunkType type;
  readonly attribute unsigned long long timestamp;  // microseconds
  readonly attribute unsigned long long? duration;  // microseconds
  readonly attribute ArrayBuffer data;
};

dictionary EncodedVideoChunkInit {
  required EncodedVideoChunkType type;
  required unsigned long long timestamp;
  unsigned long long duration;
  required BufferSource data;
};

enum EncodedVideoChunkType {
    "key",
    "delta",
};


[Exposed=(Window,DedicatedWorker)]
interface AudioFrame {
  constructor(AudioFrameInit init);
  readonly attribute unsigned long long timestamp;
  readonly attribute AudioBuffer? buffer;
  undefined close();
};

dictionary AudioFrameInit {
  required unsigned long long timestamp;
  required AudioBuffer buffer;
};


[Exposed=(Window,DedicatedWorker)]
interface VideoFrame {
  constructor(ImageBitmap imageBitmap, optional VideoFrameInit frameInit = {});
  constructor(PixelFormat pixelFormat, sequence<(Plane or PlaneInit)> planes,
              optional VideoFrameInit frameInit = {});

  readonly attribute PixelFormat format;
  readonly attribute FrozenArray<Plane> planes;
  readonly attribute unsigned long codedWidth;
  readonly attribute unsigned long codedHeight;
  readonly attribute unsigned long cropLeft;
  readonly attribute unsigned long cropTop;
  readonly attribute unsigned long cropWidth;
  readonly attribute unsigned long cropHeight;
  readonly attribute unsigned long displayWidth;
  readonly attribute unsigned long displayHeight;
  readonly attribute unsigned long long? duration;
  readonly attribute unsigned long long? timestamp;

  undefined destroy();
  VideoFrame clone();

  Promise<ImageBitmap> createImageBitmap(
    optional ImageBitmapOptions options = {});

};

dictionary VideoFrameInit {
  unsigned long codedWidth;
  unsigned long codedHeight;
  unsigned long cropLeft;
  unsigned long cropTop;
  unsigned long cropWidth;
  unsigned long cropHeight;
  unsigned long displayWidth;
  unsigned long displayHeight;
  unsigned long long duration;
  unsigned long long timestamp;
};


interface Plane {
  readonly attribute unsigned long stride;
  readonly attribute unsigned long rows;
  readonly attribute unsigned long length;

  undefined readInto(ArrayBufferView dst);
};

dictionary PlaneInit {
  required BufferSource src;
  required unsigned long stride;
  required unsigned long rows;
};


enum PixelFormat {
  "I420"
};


Issues Index

The intent is for encoder_config to be the [[active encoder config]] that was used to encode output. But, as written, it may occur that output was encoded using a previous VideoEncoderConfig that has since been replaced by a later call to configure(). See #138.
EncodedVideoChunk and EncodedAudioChunk currently expose a mutable data. See #80.