Accelerated Text Detection in Images

Draft Community Group Report,

This version:
https://wicg.github.io/shape-detection-api
Issue Tracking:
GitHub
Editors:
(Google LLC)
(Google LLC)
Participate:
Join the W3C Community Group
Fix the text through GitHub

Abstract

This document describes an API providing access to accelerated text detectors for still images and/or live image feeds.

Status of this document

This specification was published by the Web Platform Incubator Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

1. Introduction

Photos and images constitute the largest chunk of the Web, and many include recognisable features, such as human faces, QR codes or text. Detecting these features is computationally expensive, but would lead to interesting use cases e.g. face tagging, or web URL redirection. This document deals with text detection whereas the sister document [SHAPE-DETECTION-API] specifies the Face and Barcode detection cases and APIs.

1.1. Text detection use cases

Please see the Readme/Explainer in the repository.

2. Text Detection API

Individual browsers MAY provide Detectors indicating the availability of hardware providing accelerated operation.

2.1. Image sources for detection

Please refer to Accelerated Shape Detection in Images § 2.1 Image sources for detection

2.2. Text Detection API

TextDetector represents an underlying accelerated platform’s component for detection in images of Latin-1 text as defined in [iso8859-1]. It provides a single detect() operation on an ImageBitmapSource of which the result is a Promise. This method must reject this Promise in the cases detailed in § 2.1 Image sources for detection; otherwise it may queue a task using the OS/Platform resources to resolve the Promise with a sequence of DetectedTexts, each one essentially consisting on a rawValue and delimited by a boundingBox and a series of Point2Ds.

Example implementations of Text code detection are e.g. Google Play Services, Apple’s CIDetector (bounding box only, no OCR) or Windows 10 OCR API.
[
    Exposed=(Window,Worker),
    SecureContext
] interface TextDetector {
    constructor();
    Promise<sequence<DetectedText>> detect(ImageBitmapSource image);
};
TextDetector()
Detectors may potentially allocate and hold significant resources. Where possible, reuse the same TextDetector for several detections.
detect(ImageBitmapSource image)
Tries to detect text blocks in the ImageBitmapSource image.

2.2.1. DetectedText

dictionary DetectedText {
  required DOMRectReadOnly boundingBox;
  required DOMString rawValue;
  required FrozenArray<Point2D> cornerPoints;
};
boundingBox, of type DOMRectReadOnly
A rectangle indicating the position and extent of a detected feature aligned to the image
rawValue, of type DOMString
Raw string detected from the image, where characters are drawn from [iso8859-1].
cornerPoints, of type FrozenArray<Point2D>
A sequence of corner points of the detected feature, in clockwise direction and starting with top-left. This is not necessarily a square due to possible perspective distortions.

3. Examples

This section is non-normative.

Slightly modified/extended versions of these examples (and more) can be found in e.g. this codepen collection.

3.1. Platform support for a text detector

The following example can also be found in e.g. this codepen with minimal modifications.
if (window.TextDetector == undefined) {
  console.error('Text Detection not supported on this platform');
}

3.2. Text Detection

The following example can also be found in e.g. this codepen.
let textDetector = new TextDetector();
// Assuming |theImage| is e.g. a &lt;img> content, or a Blob.

textDetector.detect(theImage)
.then(detectedTextBlocks => {
  for (const textBlock of detectedTextBlocks) {
    console.log(
        'text @ (${textBlock.boundingBox.x}, ${textBlock.boundingBox.y}), ' +
        'size ${textBlock.boundingBox.width}x${textBlock.boundingBox.height}');
  }
}).catch(() => {
  console.error("Text Detection failed, boo.");
})

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Conformant Algorithms

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[GEOMETRY-1]
Simon Pieters; Chris Harrelson. Geometry Interfaces Module Level 1. URL: https://drafts.fxtf.org/geometry/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[IMAGE-CAPTURE]
Miguel Casas-sanchez; Rijubrata Bhaumik. MediaStream Image Capture. URL: https://w3c.github.io/mediacapture-image/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[SHAPE-DETECTION-API]
Accelerated Shape Detection in Images. cg-draft. URL: https://wicg.github.io/shape-detection-api/
[WEBIDL]
Edgar Chen; Timothy Gu. Web IDL Standard. Living Standard. URL: https://webidl.spec.whatwg.org/

Informative References

[ISO8859-1]
Information technology -- 8-bit single-byte coded graphic character sets -- Part 1: Latin alphabet No. 1. April 1998. URL: https://www.iso.org/standard/28245.html

IDL Index

[
    Exposed=(Window,Worker),
    SecureContext
] interface TextDetector {
    constructor();
    Promise<sequence<DetectedText>> detect(ImageBitmapSource image);
};

dictionary DetectedText {
  required DOMRectReadOnly boundingBox;
  required DOMString rawValue;
  required FrozenArray<Point2D> cornerPoints;
};