When playing audio through WebAudio, we want to be able to measure the delay of that audio and the glitchiness of the audio. This document contains a proposal of an API that would allow WebAudio users to do this.

This is an unofficial proposal.

Introduction

There is currently no way to detect whether WebAudio playout has glitches (gaps in the played audio, which typically happens due to underperformance in the audio pipeline). There is an existing way to measure the instantaneous playout latency using AudioContext.outputLatency, but no simple way to measure average/minimum/maximum latency over time.

Glitches and high latency are bad for the user experience, so if any of these occur it can be useful for the application to be able to detect this and possibly take some action to improve the playout.

The {{AudioContext/AudioPlayoutStats}} under {{AudioContext}} is a dedicated object for statistics reporting; Similar to RTCAudioPlayoutStats, but it is for the playout path via {{AudioDestinationNode}} and the associated output device. This will allow us to measure glitches occurring due to underperforming {{AudioWorklet}}s as well as glitches and delay occurring in the playout path between the {{AudioContext}} and the output device.

Audio glitches are expressed in terms of fallback frames and fallback events.

Extension of the AudioContext interface

    partial interface AudioContext {
      [SameObject] readonly attribute AudioPlayoutStats playoutStats;
    };
    

{{AudioContext}} has the following internal slot: [[\playout stats]], An instance of {{AudioContext/AudioPlayoutStats}}, initially null.

Attributes

playoutStats attribute

When accessing this attribute, run the following steps:

  1. If the {{AudioContext/[[playout stats]]}} slot is null, initialize the slot with the result of the [= initialize playout stats =] algorithm, with this as input.
  2. Return the value of the {{AudioContext/[[playout stats]]}} internal slot.
When the {{AudioContext}} is in the "running" state, once per second, the [= update playout stats =] algorithm runs on {{AudioContext/[[playout stats]]}}.

AudioPlayoutStats interface

    [Exposed=Window, SecureContext]
    interface AudioPlayoutStats {
      readonly attribute DOMHighResTimeStamp fallbackFramesDuration;
      readonly attribute unsigned long fallbackFramesEvents;
      readonly attribute DOMHighResTimeStamp totalFramesDuration;
      readonly attribute DOMHighResTimeStamp averageLatency;
      readonly attribute DOMHighResTimeStamp minimumLatency;
      readonly attribute DOMHighResTimeStamp maximumLatency;
      undefined resetLatency();
      [Default] object toJSON();
    };
    
It has the following internal slots:
[[\fallback frames duration]]

A timestamp representing the total duration of fallback frames that the {{AudioContext}} has played as of the last stat update. Initialized to 0.

[[\fallback frames events]]
And integer representing the total number of fallback events that have occurred as of the last stat update. Initialized to 0.
[[\total frames duration]]
A timestamp representing the total duration of all audio frames that the {{AudioContext}} has played as of the last stat update. Initialized to 0.
[[\average latency]]
A timestamp representing the average playout latency over the currently tracked interval.
[[\minimum latency]]
A timestamp representing the minimum playout latency over the currently tracked interval.
[[\maximum latency]]
A timestamp representing the maximum playout latency over the currently tracked interval.
[[\latency reset time]]
A timestamp representing time that the latency statistics time was created on.
[[\audio context]]
The {{AudioContext}} owning the {{AudioPlayoutStats}} instance.

Attributes

These attributes update only once per second and under specific conditions. See the Privacy & Security mitigations section and the [= update playout stats =] algorithm for details.

fallbackFramesDuration attribute

Returns {{AudioPlayoutStats/[[fallback frames duration]]}}.

Measures the duration of [=fallback frame=]s played by the {{AudioContext}}, in milliseconds. This metric can be used together with {{AudioPlayoutStats/totalFramesDuration}} to calculate the percentage of played out media that was not provided by the {{AudioContext}}.

fallbackFramesEvents attribute

Returns {{AudioPlayoutStats/[[fallback frames events]]}}.

This measures the number of [=fallback event=]s that have occurred in the {{AudioContext}}.

totalFramesDuration attribute

Returns {{AudioPlayoutStats/[[total frames duration]]}}.

Measures the duration of all audio frames played by the {{AudioContext}}, in milliseconds. Includes both fallback and non-fallback frames.

averageLatency attribute

Returns {{AudioPlayoutStats/[[average latency]]}}.

This is the average latency for the frames played since the last call to {{AudioPlayoutStats/resetLatency()}}, or since the creation of the {{AudioContext}} if {{AudioPlayoutStats/resetLatency()}} has not been called.

minimumLatency attribute

Returns {{AudioPlayoutStats/[[minimum latency]]}}.

This measures the minimum latency for the frames played since the last call to {{AudioPlayoutStats/resetLatency()}}, or since the creation of the {{AudioContext}} if {{AudioPlayoutStats/resetLatency()}} has not been called.

maximumLatency attribute

Returns {{AudioPlayoutStats/[[maximum latency]]}}.

This measures the maximum latency for the frames played since the last call to {{AudioPlayoutStats/resetLatency()}}, or since the creation of the {{AudioContext}} if {{AudioPlayoutStats/resetLatency()}} has not been called.

Methods

resetLatency method

This method resets the latency counters. Note that it does not remove latency information that has accrued but not yet been exposed through the API.

When resetLatency() is called, run the [= reset latency stats =] steps.

Internal algorithms

Running the initialize playout stats algorithm means running these steps:

  1. Let owningAudioContext be the {{AudioContext}} passed into this algorithm.
  2. Let playoutStats be a new instance of {{AudioPlayoutStats}}.
  3. Set {{AudioPlayoutStats/[[audio context]]}} in playoutStats to owningAudioContext.
  4. Run the [= reset latency stats =] steps.
  5. Return playoutStats.

Running the reset latency stats algorithm on an {{AudioPlayoutStats}} means running these steps:

  1. Let currentLatency be the playout latency of the last frame played by {{AudioPlayoutStats/[[audio context]]}}, or 0 if no frames have been played out yet.
  2. Set {{AudioPlayoutStats/[[latency reset time]]}} to the current time.
  3. Set {{AudioPlayoutStats/[[average latency]]}} to currentLatency.
  4. Set {{AudioPlayoutStats/[[minimum latency]]}} to currentLatency.
  5. Set {{AudioPlayoutStats/[[maximum latency]]}} to currentLatency.

Running the update playout stats algorithm on an {{AudioPlayoutStats}} means running these steps:

  1. If {{AudioPlayoutStats/[[audio context]]}} is not running, abort these steps.
  2. Let document be the current object's associated document.
  3. Let canUpdate be false.
  4. If document is fully active and visible, set canUpdate to true.
  5. Let permission be the [=permission state=] for the permission associated with [="microphone"=] access. If permission is "granted", set canUpdate to true.
  6. If canUpdate is false, abort these steps.
  7. Set {{AudioPlayoutStats/[[fallback frames duration]]}} to the total duration of all fallback frames (in milliseconds) that {{AudioPlayoutStats/[[audio context]]}} has played since the creation of {{AudioPlayoutStats/[[audio context]]}}.
  8. Set {{AudioPlayoutStats/[[fallback frames events]]}} to the number of times that {{AudioPlayoutStats/[[audio context]]}} has played a fallback frame after a non-fallback frame since the creation of {{AudioPlayoutStats/[[audio context]]}}.
  9. Set {{AudioPlayoutStats/[[total frames duration]]}} to the total duration of all frames (in milliseconds) that {{AudioPlayoutStats/[[audio context]]}} has played since the creation of {{AudioPlayoutStats/[[audio context]]}}.
  10. Set {{AudioPlayoutStats/[[average latency]]}} to the average playout latency (in milliseconds) for frames played by {{AudioPlayoutStats/[[audio context]]}} since {{AudioPlayoutStats/[[latency reset time]]}}.
  11. Set {{AudioPlayoutStats/[[minimum latency]]}} to the minimum playout latency (in milliseconds) for frames played by {{AudioPlayoutStats/[[audio context]]}} since {{AudioPlayoutStats/[[latency reset time]]}}.
  12. Set {{AudioPlayoutStats/[[maximum latency]]}} to the maximum playout latency (in milliseconds) for frames played by {{AudioPlayoutStats/[[audio context]]}} since {{AudioPlayoutStats/[[latency reset time]]}}.

Usage Example

This is an example of how the API can be used to calculate the following stats over a time interval:
var oldTotalFramesDuration = audioContext.playoutStats.totalFramesDuration;
var oldFallbackFramesDuration = audioContext.playoutStats.fallbackFramesDuration;
var oldFallbackFramesEvents = audioContext.playoutStats.fallbackFramesEvents;
audioContext.playoutStats.resetLatency();

// Wait while playing audio
...

// the number of seconds that were covered by the frames played by the output device between the two executions.
let deltaTotalFramesDuration = (audioContext.playoutStats.totalFramesDuration - oldTotalFramesDuration) / 1000;
let deltaFallbackFramesDuration = (audioContext.playoutStats.fallbackFramesDuration - oldFallbackFramesDuration) / 1000;
let deltaFallbackFramesEvents = audioContext.playoutStats.fallbackFramesEvents - oldFallbackFramesEvents;

// fallback frames fraction stat over the last deltaTotalFramesDuration seconds
let fallbackFramesFraction = deltaFallbackFramesDuration / deltaTotalFramesDuration;
// fallback event frequency stat over the last deltaTotalFramesDuration seconds
let fallbackEventFrequency = deltaFallbackFramesEvents / deltaTotalFramesDuration;
// Average playout delay stat during the last deltaTotalFramesDuration seconds
let playoutDelay = audioContext.playoutStats.averageLatency / 1000;

      

Security & Privacy considerations

Covert channel

See discussion with Privacy WG.

The glitch information provided by the API could be used to form a cross-site covert channel between two cooperating webstites. One site could transmit information by intentionally causing audio glitches (by causing very high CPU usage, for example) while the other site could use the API to detect these glitches.

Mitigations

To inihibit the usage of this covert channel, implementers should apply these mitigations.

Rate-limiting

Implementers should not update the values returned by the API more often than once per second. This limits the bandwidth of the covert channel.

Restricting API access

Implementers should only provide access to the API to sites that fulfill at least one of the following criteria:

  1. The site has obtained getUserMedia permission.

    The reasoning is that if a site has obtained getUserMedia permission, it can receive glitch information more efficiently through use of the microphone. These methods include:

    • One site can transmit a message by playing ultrasound (inaudible to humans) through the speaker, and the site with the getUserMedia permission can listen to it through the microphone.
    • The site can play a sine-wave at a frequency inaudible to humans, and listen to it through the microphone. By checking for gaps in the recorded signal, glitches can be detected much more reliably than what is possible using the API.
    Therefore, restricting the use of the API is no longer useful for mitigating the covert channel.

  2. The site is active/visible.

    Assuming that neither cooperating site has microphone permission, this criteria ensures that the site that receives the covert signal must be visible, restricting the conditions under which the covert channel can be used. It makes it impossible for sites to communicate with each other using the covert channel while not visible.

Future mitigations

If microphone access becomes more secure in the future, the reasoning for allowing API access to sites with getUserMedia permission may no longer apply. If this happens, the API access requirements should be reevaluated.