Compression Streams

Draft Community Group Report,

This version:
https://wicg.github.io/compression/
Issue Tracking:
GitHub
Editors:
Canon Mukai (Google)
Adam Rice (Google)

Abstract

This document defines a set of JavaScript APIs to compress and decompress streams of binary data.

Status of this document

This specification was published by the Web Platform Incubator Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

1. Introduction

This section is non-normative.

The APIs specified in this specification are used to compress and decompress streams of data. They support "deflate" and "gzip" as compression algorithms. They are widely used by web developers.

2. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words MUST and SHOULD are to be interpreted as described in [RFC2119].

This specification defines conformance criteria that apply to a single product: the user agent that implements the interfaces that it contains.

Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant.)

Implementations that use ECMAScript to implement the APIs defined in this specification MUST implement them in a manner consistent with the ECMAScript Bindings defined in the Web IDL specification [WebIDL], as this specification uses that specification and terminology.

3. Terminology

A chunk is a piece of data. In the case of CompressionStream and DecompressionStream, the output chunk type is Uint8Array. They accept any BufferSource type as input.

A stream represents an ordered sequence of chunks. The terms ReadableStream and WritableStream are defined in [WHATWG-STREAMS].

A compression context is the internal state maintained by a compression or decompression algorithm. The contents of a compression context depend on the format, algorithm and implementation in use. From the point of view of this specification, it is an opaque object. A compression context is initially in a start state such that it anticipates the first byte of input.

4. Supported formats

deflate

"ZLIB Compressed Data Format" [RFC1950]

  • This format is referred to as "deflate" for consistency with HTTP Content-Encodings. See [RFC7230] section 4.2.2.

  • Implementations must be "compliant" as described in [RFC1950] section 2.3.

  • Field values described as invalid in [RFC1950] must not be created by CompressionStream, and are errors for DecompressionStream.

  • The only valid value of the CM (Compression method) part of the CMF field is 8.

  • The FDICT flag is not supported by these APIs, and will error the stream if set.

  • The FLEVEL flag is ignored by DecompressionStream.

  • It is an error for DecompressionStream if the ADLER32 checksum is not correct.

  • It is an error if there is additional input data after the ADLER32 checksum.

gzip

"GZIP file format" [RFC1952]

  • Implementations must be "compliant" as described in [RFC1952] section 2.3.1.2.

  • Field values described as invalid in [RFC1952] must not be created by CompressionStream, and are errors for DecompressionStream.

  • The only valid value of the CM (Compression Method) field is 8.

  • The FTEXT flag must be ignored by DecompressionStream.

  • If the FHCRC field is present, it is an error for it to be incorrect.

  • The contents of any FEXTRA, FNAME and FCOMMENT fields must be ignored by DecompressionStream, except to verify that they are terminated correctly.

  • The contents of the MTIME, XFL and OS fields must be ignored by DecompressionStream.

  • It is an error if CRC32 or ISIZE do not match the decompressed data.

  • A gzip stream may only contain one "member".

  • It is an error if there is additional input data after the end of the "member".

5. Interface CompressionStream

[Exposed=(Window,Worker)]
interface CompressionStream {
  constructor(DOMString format);
};
CompressionStream includes GenericTransformStream;

A CompressionStream has an associated format and compression context context.

The new CompressionStream(format) steps are:

  1. If format is unsupported in CompressionStream, then throw a TypeError.

  2. Set this's format to format.

  3. Let transformAlgorithm be an algorithm which takes a chunk argument and runs the compress and enqueue a chunk algorithm with this and chunk.

  4. Let flushAlgorithm be an algorithm which takes no argument and runs the compress flush and enqueue algorithm with this.

  5. Set this's transform to a new TransformStream.

  6. Set up this's transform with transformAlgorithm set to transformAlgorithm and flushAlgorithm set to flushAlgorithm.

The compress and enqueue a chunk algorithm, given a CompressionStream object cs and a chunk, runs these steps:

  1. If chunk is not a BufferSource type, then throw a TypeError.

  2. Let buffer be the result of compressing chunk with cs's format and context.

  3. If buffer is empty, return.

  4. Split buffer into one or more non-empty pieces and convert them into Uint8Arrays.

  5. For each Uint8Array array, enqueue array in cs's transform.

The compress flush and enqueue algorithm, which handles the end of data from the input ReadableStream object, given a CompressionStream object cs, runs these steps:

  1. Let buffer be the result of compressing an empty input with cs's format and context, with the finish flag.

  2. If buffer is empty, return.

  3. Split buffer into one or more non-empty pieces and convert them into Uint8Arrays.

  4. For each Uint8Array array, enqueue array in cs's transform.

6. Interface DecompressionStream

[Exposed=(Window,Worker)]
interface DecompressionStream {
  constructor(DOMString format);
};
DecompressionStream includes GenericTransformStream;

A DecompressionStream has an associated format and compression context context.

The new DecompressionStream(format) steps are:

  1. If format is unsupported in DecompressionStream, then throw a TypeError.

  2. Set this's format to format.

  3. Let transformAlgorithm be an algorithm which takes a chunk argument and runs the decompress and enqueue a chunk algorithm with this and chunk.

  4. Let flushAlgorithm be an algorithm which takes no argument and runs the decompress flush and enqueue algorithm with this.

  5. Set this's transform to a new TransformStream.

  6. Set up this's transform with transformAlgorithm set to transformAlgorithm and flushAlgorithm set to flushAlgorithm.

The decompress and enqueue a chunk algorithm, given a DecompressionStream object ds and a chunk, runs these steps:

  1. If chunk is not a BufferSource type, then throw a TypeError.

  2. Let buffer be the result of decompressing chunk with ds's format and context. If this results in an error, then throw a TypeError.

  3. If buffer is empty, return.

  4. Split buffer into one or more non-empty pieces and convert them into Uint8Arrays.

  5. For each Uint8Array array, enqueue array in ds's transform.

The decompress flush and enqueue algorithm, which handles the end of data from the input ReadableStream object, given a DecompressionStream object ds, runs these steps:

  1. Let buffer be the result of decompressing an empty input with ds's format and context, with the finish flag.

  2. If the end of the compressed input has not been reached, then throw a TypeError.

  3. If buffer is empty, return.

  4. Split buffer into one or more non-empty pieces and convert them into Uint8Arrays.

  5. For each Uint8Array array, enqueue array in ds's transform.

7. Privacy and Security Considerations

The API doesn’t add any new privileges to the web platform.

However, web developers have to pay attention to the situation when attackers can get the length of the data. If so, they may be able to guess the contents of the data.

8. Examples

8.1. Gzip-compress a stream

const compressedReadableStream
    = inputReadableStream.pipeThrough(new CompressionStream('gzip'));

8.2. Deflate-compress an ArrayBuffer to a Uint8Array

async function compressArrayBuffer(input) {
  const cs = new CompressionStream('deflate');
  const writer = cs.writable.getWriter();
  writer.write(input);
  writer.close();
  const output = [];
  const reader = cs.readable.getReader();
  let totalSize = 0;
  while (true) {
    const { value, done } = await reader.read();
    if (done)
      break;
    output.push(value);
    totalSize += value.byteLength;
  }
  const concatenated = new Uint8Array(totalSize);
  let offset = 0;
  for (const array of output) {
    concatenated.set(array, offset);
    offset += array.byteLength;
  }
  return concatenated;
}

8.3. Gzip-decompress a Blob to Blob

function decompressBlob(blob) {
  const ds = new DecompressionStream('gzip');
  const decompressionStream = blob.stream().pipeThrough(ds);
  return new Response(decompressedStream).blob();
}

9. Acknowledgments

The editors wish to thank Domenic Denicola and Yutaka Hirano, for their support.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[RFC1950]
P. Deutsch; J-L. Gailly. ZLIB Compressed Data Format Specification version 3.3. May 1996. Informational. URL: https://datatracker.ietf.org/doc/html/rfc1950
[RFC1952]
P. Deutsch. GZIP file format specification version 4.3. May 1996. Informational. URL: https://datatracker.ietf.org/doc/html/rfc1952
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[WebIDL]
Boris Zbarsky. Web IDL. 15 December 2016. ED. URL: https://heycam.github.io/webidl/
[WHATWG-STREAMS]
Adam Rice; Domenic Denicola; 吉野剛史 (Takeshi Yoshino). Streams Standard. Living Standard. URL: https://streams.spec.whatwg.org/

Informative References

[RFC7230]
R. Fielding, Ed.; J. Reschke, Ed.. Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing. June 2014. Proposed Standard. URL: https://httpwg.org/specs/rfc7230.html

IDL Index

[Exposed=(Window,Worker)]
interface CompressionStream {
  constructor(DOMString format);
};
CompressionStream includes GenericTransformStream;

[Exposed=(Window,Worker)]
interface DecompressionStream {
  constructor(DOMString format);
};
DecompressionStream includes GenericTransformStream;