加速的图形识别

Draft Community Group Report,

This version:
https://wicg.github.io/shape-detection-api
Issue Tracking:
GitHub
Editor:
(Google Inc.)
Participate:
Join the W3C Community Group
Fix the text through GitHub

Abstract

本文档描述了一套Chrome中针对静态和/或动态图像的图形识别(如:人脸识别)API。

Status of this document

This specification was published by the Web Platform Incubator Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

1. 简介

照片和图像是互联网构成中最大的部分,其中相当一部分包含了可识别的特征,比如人脸,二维码或者文本。可想而之,识别这些特征的计算开销非常大,但有些很有趣场景,比如在照片中自动标记人脸,或者根据图像中的URL进行重定向。硬件厂商从很久以前就已经开始支持这些特性,但Web应用迟迟未能很好地利用上这些硬件特性,必须借助一些难用的程序库才能达到目的。

1.1. 图形识别的场景

请参考代码库中自述/解释 的文档。

2. 图形识别API

某些特定的浏览器可能会提供识别器来标示当前硬件是否提供加速功能。

2.1. 用于识别的图像源

本节的灵感来自 HTML Canvas 2D Context §image-sources-for-2d-rendering-contexts

ImageBitmapSource 允许多种图形接口的实现对象作为图像源,进行识别处理。

当用户代理程序(User Agent)被要求用某种既有的ImageBitmapSource作为识别器的detect()方法的输入参数的时候,必须执行以下步骤:

请注意,如果一个ImageBitmapSource的水平尺寸或垂直尺寸等于0,那么对应的Promise对象就会被简单地当作一个空的已检测对象序列来处理。

2.2. 人脸识别API

FaceDetector代表一个针对图像中的人脸进行识别的底层加速平台组件。创建时可以选择一个FaceDetectorOptions的Dictionary对象作为入参。它提供了一个单独的 detect()方法操作ImageBitmapSource对象,并返回Promise对象。如果检测到§ 1.1 图形识别的场景中提及的用例,则该方法必须拒绝该Promise对象;否则,它可能会向DetectedFace序列推入一个新任务,这样会消耗操作系统或平台资源去依序处理该Promise,每个任务由一个boundingBox所包含并界定。

dictionary FaceDetectorOptions {
  unsigned short maxDetectedFaces;
  boolean fastMode;
};
maxDetectedFaces, of type unsigned short
当前场景中已识别的人脸数的最大值。
fastMode, of type boolean
提示User Agent(UA)尝试以速度优先(于精确度)的模式,通过更小的比例尺(更靠近目标图形)或寻找更大的目标图形的办法进行识别。
[Exposed=(Window,Worker), Constructor(optional FaceDetectorOptions faceDetectorOptions)]
interface FaceDetector {
  Promise<sequence<DetectedFace>> detect(ImageBitmapSource image);
};
FaceDetector(optional FaceDetectorOptions faceDetectorOptions)
构建一个新的FaceDetector对象,附带可选项faceDetectorOptions
detect()
尝试在ImageBitmapSource 图像中识别人脸,如果识别到人脸,则返回一个DetectedFace序列。
interface DetectedFace {
  [SameObject] readonly attribute DOMRectReadOnly boundingBox;
};
boundingBox, of type DOMRectReadOnly, readonly
与图像坐标轴对齐的一个矩形,该矩形标示了一个已识别特征的位置和范围。
人脸识别的实现案例有:Android FaceDetector, Apple’s CIFaceFeature 或者 Windows 10 FaceDetector
Consider adding attributes such as, e.g.:
[SameObject] readonly attribute unsigned long id;
[SameObject] readonly attribute FrozenArray<Landmark>? landmarks;

to DetectedFace.

2.3. 条形码识别API

BarcodeDetector代表一个针对图像中的二维码或条形码进行识别的底层加速平台组件。它提供了一个单独的detect()方法操作ImageBitmapSource对象,并返回Promise对象。如果检测到§ 1.1 图形识别的场景中提及的情况,则该方法必须拒绝该Promise对象;否则,它可能会向DetectedBarcode序列推入一个新任务,这样会消耗操作系统或平台资源去依序处理该Promise。基本上每个任务包含boundingBox和一系列Point2D,甚至可能还有个解码后的DOMString对象rawValue,由它们来确定边界。

[Exposed=(Window,Worker), Constructor()]
interface BarcodeDetector {
  Promise<sequence<DetectedBarcode>> detect(ImageBitmapSource image);
};
detect(ImageBitmapSource image)
尝试在ImageBitmapSource图像中识别条形码。
interface DetectedBarcode {
  [SameObject] readonly attribute DOMRectReadOnly boundingBox;
  [SameObject] readonly attribute DOMString rawValue;
  [SameObject] readonly attribute FrozenArray<Point2D> cornerPoints;
};
boundingBox, of type DOMRectReadOnly, readonly
与图像坐标轴对齐的一个矩形,该矩形标示了一个已识别特征的位置和范围。
rawValue, of type DOMString, readonly
从条形码解码得到的DOMString对象,该值可能为多行。
cornerPoints, of type FrozenArray<Point2D>, readonly
一串已识别条形码的顶点序列(sequence),顺序从左上角开始,以顺时针方向排列。因为现实中透视形变的原因,该序列不一定表示的是正方形。
实现了条形码/二维码识别的示例有:Google Play Services 或者 Apple’s CICRCodeFeature.

2.4. 文本识别API

TextDetector代表一个针对图像中的文本进行识别的底层加速平台组件。它提供了一个单独的detect()方法操作ImageBitmapSource对象,并返回Promise对象。如果检测到§ 1.1 图形识别的场景中提及的情况,则该方法必须拒绝该Promise对象;否则,它可能会向DetectedText序列推入一个新任务,这样会消耗操作系统或平台资源去依序处理该Promise,基本上每个task包含一个rawValue,并由一个boundingBox来确定边界。

[
    Constructor,
    Exposed=(Window,Worker),
] interface TextDetector {
    Promise<sequence<DetectedText>> detect(ImageBitmapSource image);
};
detect(ImageBitmapSource image)
尝试在ImageBitmapSource 图像中识别文本块。.
[
    Constructor,
] interface DetectedText {
    [SameObject] readonly attribute DOMRect boundingBox;
    [SameObject] readonly attribute DOMString rawValue;
};
boundingBox, of type DOMRect, readonly
与图像坐标轴对齐的一个矩形,该矩形标示了一个已识别特征的位置和范围。
rawValue, of type DOMString, readonly
从图像中识别到的原始字符串。
实现了文本识别的示例有:Google Play Services, Apple’s CIDetector 或者 Windows 10 OCR API.

3. 示例

以下示例的微调或扩展版本,以及更多示例请参考这个codepen集合

3.1. 图形识别器的平台支持

以下的示例同样可以在这个codepen中找到微调的版本。
if (window.FaceDetector == undefined) {
  console.error('Face Detection not supported on this platform');
}
if (window.BarcodeDetector == undefined) {
  console.error('Barcode Detection not supported on this platform');
}
if (window.TextDetector == undefined) {
  console.error('Text Detection not supported on this platform');
}

3.2. 人脸识别

以下的示例同样可以在这个codepen(或者这个有边界框覆盖的图像示例中)找到。
let faceDetector = new FaceDetector({fastMode: true, maxDetectedFaces: 1});
// Assuming |theImage| is e.g. a <img> content, or a Blob.

faceDetector.detect(theImage)
.then(detectedFaces => {
  for (const face of detectedFaces) {
    console.log(' Face @ (${face.boundingBox.x}, ${face.boundingBox.y}),' +
        ' size ${face.boundingBox.width}x${face.boundingBox.height}');
  }
}).catch(() => {
  console.error("Face Detection failed, boo.");
})

3.3. 条形码识别

以下的示例同样可以在这个这个codepen(或者这个覆盖了边界框的图像示例中)找到。
let barcodeDetector = new BarcodeDetector();
// Assuming |theImage| is e.g. a <img> content, or a Blob.

barcodeDetector.detect(theImage)
.then(detectedCodes => {
  for (const barcode of detectedCodes) {
    console.log(' Barcode ${barcode.rawValue}' +
        ' @ (${barcode.boundingBox.x}, ${barcode.boundingBox.y}) with size' +
        ' ${barcode.boundingBox.width}x${barcode.boundingBox.height}');
  }
}).catch(() => {
  console.error("Barcode Detection failed, boo.");
})

3.4. 文本识别

以下的示例同样可以在这个codepen (或者这个集成了视频捕捉功能的示例)找到。
let textDetector = new TextDetector();
// Assuming |theImage| is e.g. a <img> content, or a Blob.

textDetector.detect(theImage)
.then(detectedTextBlocks => {
  for (const textBlock of detectedTextBlocks) {
    console.log(
        'text @ (${textBlock.boundingBox.x}, ${textBlock.boundingBox.y}), ' +
        'size ${textBlock.boundingBox.width}x${textBlock.boundingBox.height}');
  }
}).catch(() => {
  console.error("Text Detection failed, boo.");
})

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[GEOMETRY-1]
Simon Pieters; Chris Harrelson. Geometry Interfaces Module Level 1. 4 December 2018. CR. URL: https://www.w3.org/TR/geometry-1/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[WebIDL]
Boris Zbarsky. Web IDL. 15 December 2016. ED. URL: https://heycam.github.io/webidl/

IDL Index

dictionary FaceDetectorOptions {
  unsigned short maxDetectedFaces;
  boolean fastMode;
};

[Exposed=(Window,Worker), Constructor(optional FaceDetectorOptions faceDetectorOptions)]
interface FaceDetector {
  Promise<sequence<DetectedFace>> detect(ImageBitmapSource image);
};

interface DetectedFace {
  [SameObject] readonly attribute DOMRectReadOnly boundingBox;
};

[Exposed=(Window,Worker), Constructor()]
interface BarcodeDetector {
  Promise<sequence<DetectedBarcode>> detect(ImageBitmapSource image);
};

interface DetectedBarcode {
  [SameObject] readonly attribute DOMRectReadOnly boundingBox;
  [SameObject] readonly attribute DOMString rawValue;
  [SameObject] readonly attribute FrozenArray<Point2D> cornerPoints;
};

[
    Constructor,
    Exposed=(Window,Worker),
] interface TextDetector {
    Promise<sequence<DetectedText>> detect(ImageBitmapSource image);
};

[
    Constructor,
] interface DetectedText {
    [SameObject] readonly attribute DOMRect boundingBox;
    [SameObject] readonly attribute DOMString rawValue;
};