1. Introduction
Photos and images constitute the largest chunk of the Web, and many include recognisable features, such as human faces, QR codes or text. Detecting these features is computationally expensive, but would lead to interesting use cases e.g. face tagging, or web URL redirection. While hardware manufacturers have been supporting these features for a long time, Web Apps do not yet have access to these hardware capabilities, which makes the use of computationally demanding libraries necessary.
1.1. Shape detection use cases
Please see the Readme/Explainer in the repository.
2. Shape Detection API
Individual browsers MAY provide Detectors indicating the availability of hardware providing accelerated operation.
2.1. Image sources for detection
This section is inspired by HTML Canvas 2D Context §image-sources-for-2d-rendering-contexts.
ImageBitmapSource allows objects implementing any of a number of interfaces to be used as image sources for the detection process.
-
When an
ImageBitmapSourceobject represents anHTMLImageElement, the element’s image must be used as the source image. Specifically, when anImageBitmapSourceobject represents an animated image in anHTMLImageElement, the user agent must use the default image of the animation (the one that the format defines is to be used when animation is not supported or is disabled), or, if there is no such image, the first frame of the animation. -
When an
ImageBitmapSourceobject represents anHTMLVideoElement, then the frame at the current playback position when the method with the argument is invoked must be used as the source image when processing the image, and the source image’s dimensions must be the intrinsic dimensions of the media resource (i.e. after any aspect-ratio correction has been applied). -
When an
ImageBitmapSourceobject represents anHTMLCanvasElement, the element’s bitmap must be used as the source image.
When the UA is required to use a given type of ImageBitmapSource as input argument for thedetect() method of whichever detector, it MUST run these steps:
-
If any
ImageBitmapSourcehave an effective script origin (HTML Standard §concept-origin) which is not the same as the Document’s effective script origin, then reject the Promise with a newDOMExceptionwhose name isSecurityError. -
If the
ImageBitmapSourceis anHTMLImageElementobject that is in the broken state, then reject the Promise with a newDOMExceptionwhose name isInvalidStateError, and abort any further steps. -
If the
ImageBitmapSourceis anHTMLImageElementobject that is not fully decodable then reject the Promise with a newDOMExceptionwhose name isInvalidStateError, and abort any further steps -
If the
ImageBitmapSourceis anHTMLVideoElementobject whose readyState attribute is either HAVE_NOTHING or HAVE_METADATA, then reject the Promise with a newDOMExceptionwhose name isInvalidStateError, and abort any further steps. -
If the
ImageBitmapSourceargument is anHTMLCanvasElementwhose bitmap’s origin-clean (HTML Standard §concept-canvas-origin-clean) flag is false, then reject the Promise with a newDOMExceptionwhose name isSecurityError, and abort any further steps.
Note that if the ImageBitmapSource is an object with either a horizontal dimension or a vertical dimension equal to zero, then the Promise will be simply resolved with an empty sequence of detected objects.
2.2. Face Detection API
FaceDetector represents an underlying accelerated platform’s component for detection of human faces in images. It can be created with an optional Dictionary of FaceDetectorOptions. It provides a single detect() operation on an ImageBitmapSource which result is a Promise. This method MUST reject this promise in the cases detailed in §2.1 Image sources for detection; otherwise it MAY queue a task that utilizes the OS/Platform resources to resolve the Promise with a Sequence of DetectedFaces, each one essentially consisting on and delimited by a boundingBox.
dictionary FaceDetectorOptions { unsigned short maxDetectedFaces; boolean fastMode; };
maxDetectedFaces, of type unsigned short- Maximum number of detected faces to be identified in the scene.
fastMode, of type boolean- Hint to the UA to try and prioritise speed over accuracy by e.g. operating on a reduced scale or looking for large features.
[Exposed=(Window,Worker), Constructor(optional FaceDetectorOptions faceDetectorOptions)] interface FaceDetector { Promise<sequence<DetectedFace>> detect(ImageBitmapSource image); };
FaceDetector(optional FaceDetectorOptions faceDetectorOptions)- Constructs a new
FaceDetectorwith the optional faceDetectorOptions. detect()- Tries to detect human faces in the
ImageBitmapSourceimage. The detected faces, if any, are returned as a sequence ofDetectedFaces.
interface DetectedFace { [SameObject] readonly attribute DOMRectReadOnly boundingBox; };
boundingBox, of type DOMRectReadOnly, readonly- A rectangle indicating the position and extent of a detected feature aligned to the image axes.
[SameObject] readonly attribute unsigned long id; [SameObject] readonly attribute FrozenArray<Landmark>? landmarks;
to DetectedFace.
2.3. Barcode Detection API
BarcodeDetector represents an underlying accelerated platform’s component for detection in images of QR codes or barcodes. It provides a single detect() operation on an ImageBitmapSource which result is a Promise. This method MUST reject this Promise in the cases detailed in §2.1 Image sources for detection; otherwise it MAY queue a task using the OS/Platform resources to resolve the Promise with a sequence of DetectedBarcodes, each one essentially consisting on and delimited by a boundingBox and a series or Point2Ds, and possibly a rawValue decoded DOMString.
[Exposed=(Window,Worker), Constructor()] interface BarcodeDetector { Promise<sequence<DetectedBarcode>> detect(ImageBitmapSource image); };
detect(ImageBitmapSource image)- Tries to detect barcodes in the
ImageBitmapSourceimage.
interface DetectedBarcode { [SameObject] readonly attribute DOMRectReadOnly boundingBox; [SameObject] readonly attribute DOMString rawValue; [SameObject] readonly attribute FrozenArray<Point2D> cornerPoints; };
boundingBox, of type DOMRectReadOnly, readonly- A rectangle indicating the position and extent of a detected feature aligned to the image
rawValue, of type DOMString, readonly- String decoded from the barcode. This value might be multiline.
cornerPoints, of type FrozenArray<Point2D>, readonly- A sequence of corner points of the detected barcode, in clockwise direction and starting with top-left. This is not necessarily a square due to possible perspective distortions.
2.4. Text Detection API
TextDetector represents an underlying accelerated platform’s component for detection in images of text. It provides a single detect() operation on an ImageBitmapSource of which the result is a Promise. This method MUST reject this Promise in the cases detailed in §2.1 Image sources for detection; otherwise it MAY queue a task using the OS/Platform resources to resolve the Promise with a sequence of DetectedTexts, each one essentially consisting on a rawValue and delimited by a boundingBox.
[
Constructor,
Exposed=(Window,Worker),
] interface TextDetector {
Promise<sequence<DetectedText>> detect(ImageBitmapSource image);
};
detect(ImageBitmapSource image)- Tries to detect text blocks in the
ImageBitmapSourceimage.
[
Constructor,
] interface DetectedText {
[SameObject] readonly attribute DOMRect boundingBox;
[SameObject] readonly attribute DOMString rawValue;
};
boundingBox, of type DOMRect, readonly- A rectangle indicating the position and extent of a detected feature aligned to the image
rawValue, of type DOMString, readonly- Raw string detected from the image.
3. Examples
Slightly modified/extended versions of these examples (and more) can be found in e.g. this codepen collection.
3.1. Platform support for a given detector
if (window.FaceDetector == undefined) { console.error('Face Detection not supported on this platform'); } if (window.BarcodeDetector == undefined) { console.error('Barcode Detection not supported on this platform'); } if (window.TextDetector == undefined) { console.error('Text Detection not supported on this platform'); }
3.2. Face Detection
let faceDetector = new FaceDetector({fastMode: true, maxDetectedFaces: 1}); // Assuming |theImage| is e.g. a <img> content, or a Blob. faceDetector.detect(theImage) .then(detectedFaces => { for (const face of detectedFaces) { console.log(' Face @ (${face.boundingBox.x}, ${face.boundingBox.y}),' + ' size ${face.boundingBox.width}x${face.boundingBox.height}'); } }).catch(() => { console.error("Face Detection failed, boo."); })
3.3. Barcode Detection
let barcodeDetector = new BarcodeDetector(); // Assuming |theImage| is e.g. a <img> content, or a Blob. barcodeDetector.detect(theImage) .then(detectedCodes => { for (const barcode of detectedCodes) { console.log(' Barcode ${barcode.rawValue}' + ' @ (${barcode.boundingBox.x}, ${barcode.boundingBox.y}) with size' + ' ${barcode.boundingBox.width}x${barcode.boundingBox.height}'); } }).catch(() => { console.error("Barcode Detection failed, boo."); })
3.4. Text Detection
let textDetector = new TextDetector(); // Assuming |theImage| is e.g. a <img> content, or a Blob. textDetector.detect(theImage) .then(detectedTextBlocks => { for (const textBlock of detectedTextBlocks) { console.log( 'text @ (${textBlock.boundingBox.x}, ${textBlock.boundingBox.y}), ' + 'size ${textBlock.boundingBox.width}x${textBlock.boundingBox.height}'); } }).catch(() => { console.error("Text Detection failed, boo."); })