Accelerated Shape Detection in Images

Editor’s Draft,

This version:
https://wicg.github.io/shape-detection-api
Issue Tracking:
GitHub
Editor:
(Google Inc.)
Translations (non-normative and likely out-of-date):
简体中文
Participate:
Join the W3C Community Group
Fix the text through GitHub

Abstract

This document describes an API providing access to accelerated shape detectors (e.g. human faces) for still images and/or live image feeds.

Status of this document

This specification was published by the Web Platform Incubator Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

1. Introduction

Photos and images constitute the largest chunk of the Web, and many include recognisable features, such as human faces, QR codes or text. Detecting these features is computationally expensive, but would lead to interesting use cases e.g. face tagging, or web URL redirection. While hardware manufacturers have been supporting these features for a long time, Web Apps do not yet have access to these hardware capabilities, which makes the use of computationally demanding libraries necessary.

1.1. Shape detection use cases

Please see the Readme/Explainer in the repository.

2. Shape Detection API

Individual browsers MAY provide Detectors indicating the availability of hardware providing accelerated operation.

2.1. Image sources for detection

This section is inspired by HTML Canvas 2D Context §image-sources-for-2d-rendering-contexts.

ImageBitmapSource allows objects implementing any of a number of interfaces to be used as image sources for the detection process.

When the UA is required to use a given type of ImageBitmapSource as input argument for thedetect() method of whichever detector, it MUST run these steps:

Note that if the ImageBitmapSource is an object with either a horizontal dimension or a vertical dimension equal to zero, then the Promise will be simply resolved with an empty sequence of detected objects.

2.2. Face Detection API

FaceDetector represents an underlying accelerated platform’s component for detection of human faces in images. It can be created with an optional Dictionary of FaceDetectorOptions. It provides a single detect() operation on an ImageBitmapSource which result is a Promise. This method MUST reject this promise in the cases detailed in §2.1 Image sources for detection; otherwise it MAY queue a task that utilizes the OS/Platform resources to resolve the Promise with a Sequence of DetectedFaces, each one essentially consisting on and delimited by a boundingBox.

dictionary FaceDetectorOptions {
  unsigned short maxDetectedFaces;
  boolean fastMode;
};
maxDetectedFaces, of type unsigned short
Maximum number of detected faces to be identified in the scene.
fastMode, of type boolean
Hint to the UA to try and prioritise speed over accuracy by e.g. operating on a reduced scale or looking for large features.
[Exposed=(Window,Worker), Constructor(optional FaceDetectorOptions faceDetectorOptions)]
interface FaceDetector {
  Promise<sequence<DetectedFace>> detect(ImageBitmapSource image);
};
FaceDetector(optional FaceDetectorOptions faceDetectorOptions)
Constructs a new FaceDetector with the optional faceDetectorOptions.
detect()
Tries to detect human faces in the ImageBitmapSource image. The detected faces, if any, are returned as a sequence of DetectedFaces.
interface DetectedFace {
  [SameObject] readonly attribute DOMRectReadOnly boundingBox;
};
boundingBox, of type DOMRectReadOnly, readonly
A rectangle indicating the position and extent of a detected feature aligned to the image axes.
Example implementations of face detection are e.g. Android FaceDetector, Apple’s CIFaceFeature or Windows 10 FaceDetector.
Consider adding attributes such as, e.g.:
[SameObject] readonly attribute unsigned long id;
[SameObject] readonly attribute FrozenArray<Landmark>? landmarks;

to DetectedFace.

2.3. Barcode Detection API

BarcodeDetector represents an underlying accelerated platform’s component for detection in images of QR codes or barcodes. It provides a single detect() operation on an ImageBitmapSource which result is a Promise. This method MUST reject this Promise in the cases detailed in §2.1 Image sources for detection; otherwise it MAY queue a task using the OS/Platform resources to resolve the Promise with a sequence of DetectedBarcodes, each one essentially consisting on and delimited by a boundingBox and a series or Point2Ds, and possibly a rawValue decoded DOMString.

[Exposed=(Window,Worker), Constructor()]
interface BarcodeDetector {
  Promise<sequence<DetectedBarcode>> detect(ImageBitmapSource image);
};
detect(ImageBitmapSource image)
Tries to detect barcodes in the ImageBitmapSource image.
interface DetectedBarcode {
  [SameObject] readonly attribute DOMRectReadOnly boundingBox;
  [SameObject] readonly attribute DOMString rawValue;
  [SameObject] readonly attribute FrozenArray<Point2D> cornerPoints;
};
boundingBox, of type DOMRectReadOnly, readonly
A rectangle indicating the position and extent of a detected feature aligned to the image
rawValue, of type DOMString, readonly
String decoded from the barcode. This value might be multiline.
cornerPoints, of type FrozenArray<Point2D>, readonly
A sequence of corner points of the detected barcode, in clockwise direction and starting with top-left. This is not necessarily a square due to possible perspective distortions.
Example implementations of Barcode/QR code detection are e.g. Google Play Services or Apple’s CICRCodeFeature.

2.4. Text Detection API

TextDetector represents an underlying accelerated platform’s component for detection in images of text. It provides a single detect() operation on an ImageBitmapSource of which the result is a Promise. This method MUST reject this Promise in the cases detailed in §2.1 Image sources for detection; otherwise it MAY queue a task using the OS/Platform resources to resolve the Promise with a sequence of DetectedTexts, each one essentially consisting on a rawValue and delimited by a boundingBox.

[
    Constructor,
    Exposed=(Window,Worker),
] interface TextDetector {
    Promise<sequence<DetectedText>> detect(ImageBitmapSource image);
};
detect(ImageBitmapSource image)
Tries to detect text blocks in the ImageBitmapSource image.
[
    Constructor,
] interface DetectedText {
    [SameObject] readonly attribute DOMRect boundingBox;
    [SameObject] readonly attribute DOMString rawValue;
};
boundingBox, of type DOMRect, readonly
A rectangle indicating the position and extent of a detected feature aligned to the image
rawValue, of type DOMString, readonly
Raw string detected from the image.
Example implementations of Text code detection are e.g. Google Play Services, Apple’s CIDetector or Windows 10 OCR API.

3. Examples

Slightly modified/extended versions of these examples (and more) can be found in e.g. this codepen collection.

3.1. Platform support for a given detector

The following example can also be found in e.g. this codepen with minimal modifications.
if (window.FaceDetector == undefined) {
  console.error('Face Detection not supported on this platform');
}
if (window.BarcodeDetector == undefined) {
  console.error('Barcode Detection not supported on this platform');
}
if (window.TextDetector == undefined) {
  console.error('Text Detection not supported on this platform');
}

3.2. Face Detection

The following example can also be found in e.g. this codepen.
let faceDetector = new FaceDetector({fastMode: true, maxDetectedFaces: 1});
// Assuming |theImage| is e.g. a <img> content, or a Blob.

faceDetector.detect(theImage)
.then(detectedFaces => {
  for (const face of detectedFaces) {
    console.log(' Face @ (${face.boundingBox.x}, ${face.boundingBox.y}),' +
        ' size ${face.boundingBox.width}x${face.boundingBox.height}');
  }
}).catch(() => {
  console.error("Face Detection failed, boo.");
})

3.3. Barcode Detection

The following example can also be found in e.g. this codepen.
let barcodeDetector = new BarcodeDetector();
// Assuming |theImage| is e.g. a <img> content, or a Blob.

barcodeDetector.detect(theImage)
.then(detectedCodes => {
  for (const barcode of detectedCodes) {
    console.log(' Barcode ${barcode.rawValue}' +
        ' @ (${barcode.boundingBox.x}, ${barcode.boundingBox.y}) with size' +
        ' ${barcode.boundingBox.width}x${barcode.boundingBox.height}');
  }
}).catch(() => {
  console.error("Barcode Detection failed, boo.");
})

3.4. Text Detection

The following example can also be found in e.g. this codepen.
let textDetector = new TextDetector();
// Assuming |theImage| is e.g. a <img> content, or a Blob.

textDetector.detect(theImage)
.then(detectedTextBlocks => {
  for (const textBlock of detectedTextBlocks) {
    console.log(
        'text @ (${textBlock.boundingBox.x}, ${textBlock.boundingBox.y}), ' +
        'size ${textBlock.boundingBox.width}x${textBlock.boundingBox.height}');
  }
}).catch(() => {
  console.error("Text Detection failed, boo.");
})

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[GEOMETRY-1]
Simon Pieters; Dirk Schulze; Rik Cabanier. Geometry Interfaces Module Level 1. URL: https://www.w3.org/TR/geometry-1/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[WebIDL]
Cameron McCormack; Boris Zbarsky; Tobie Langel. Web IDL. URL: https://www.w3.org/TR/WebIDL-1/

IDL Index

dictionary FaceDetectorOptions {
  unsigned short maxDetectedFaces;
  boolean fastMode;
};

[Exposed=(Window,Worker), Constructor(optional FaceDetectorOptions faceDetectorOptions)]
interface FaceDetector {
  Promise<sequence<DetectedFace>> detect(ImageBitmapSource image);
};

interface DetectedFace {
  [SameObject] readonly attribute DOMRectReadOnly boundingBox;
};

[Exposed=(Window,Worker), Constructor()]
interface BarcodeDetector {
  Promise<sequence<DetectedBarcode>> detect(ImageBitmapSource image);
};

interface DetectedBarcode {
  [SameObject] readonly attribute DOMRectReadOnly boundingBox;
  [SameObject] readonly attribute DOMString rawValue;
  [SameObject] readonly attribute FrozenArray<Point2D> cornerPoints;
};

[
    Constructor,
    Exposed=(Window,Worker),
] interface TextDetector {
    Promise<sequence<DetectedText>> detect(ImageBitmapSource image);
};

[
    Constructor,
] interface DetectedText {
    [SameObject] readonly attribute DOMRect boundingBox;
    [SameObject] readonly attribute DOMString rawValue;
};