加速的图形识别

1. 简介

照片和图像是互联网构成中最大的部分，其中相当一部分包含了可识别的特征，比如人脸，二维码或者文本。可想而之，识别这些特征的计算开销非常大，但有些很有趣场景，比如在照片中自动标记人脸，或者根据图像中的URL进行重定向。硬件厂商从很久以前就已经开始支持这些特性，但Web应用迟迟未能很好地利用上这些硬件特性，必须借助一些难用的程序库才能达到目的。

1.1. 图形识别的场景

请参考代码库中自述/解释的文档。

2. 图形识别API

某些特定的浏览器可能会提供识别器来标示当前硬件是否提供加速功能。

2.1. 用于识别的图像源

本节的灵感来自 HTML Canvas 2D Context §image-sources-for-2d-rendering-contexts。

ImageBitmapSource 允许多种图形接口的实现对象作为图像源，进行识别处理。

当ImageBitmapSource对象代表HTMLImageElement的时候，该元素的图像必须用作源图像。而在特定情况下，当ImageBitmapSource对象代表HTMLImageElement中的动画图像的时候，用户代理程序(User Agent)必须显示这个动画图像的默认图像（该默认图像指的是，在动画图像被禁用或不支持动画的环境下，需要展现的图像），或者没有默认图像的话，就显示该动画图像的第一帧。
当ImageBitmapSource对象代表HTMLVideoElement的时候，该视频播放的当前帧必须用作源图像，同时，该源图像的尺寸必须是视频源的固有维数(intrinsic dimensions)，换句话说，就是视频源经过任意比例的调整后的大小。
当ImageBitmapSource对象代表HTMLCanvasElement的时候，该元素的位图必须用作源图像。

当用户代理程序(User Agent)被要求用某种既有的ImageBitmapSource作为识别器的detect()方法的输入参数的时候，必须执行以下步骤：

如果ImageBitmapSource所含的有效脚本源()和当前文档的有效脚本源不同，就拒绝对应的Promise对象，并附上一个名为SecurityError的新建DOMException对象。
如果一个ImageBitmapSource是一个处于broken状态的HTMLImageElement对象的话，就拒绝对应的Promise对象，并附上一个名为InvalidStateError的新建DOMException对象，同时停止之后的所有步骤。
如果ImageBitmapSource是一个不能完整解码的HTMLImageElement对象的话，就拒绝对应的Promise对象，并附上一个名为InvalidStateError的新建DOMException对象，同时停止之后的所有步骤。
如果一个ImageBitmapSource是一个HTMLVideoElement对象，且其readyState属性为HAVE_NOTHING 或 HAVE_METADATA的话，就拒绝对应的Promise对象，并附上一个名为InvalidStateError的新建DOMException对象，同时停止之后的所有步骤。
如果一个ImageBitmapSource是一个HTMLCanvasElement对象，且其位图的origin-clean ()标识为false的话，就拒绝对应的Promise对象，并附上一个名为SecurityError的新建DOMException对象，同时停止之后的所有步骤。

请注意，如果一个ImageBitmapSource的水平尺寸或垂直尺寸等于0，那么对应的Promise对象就会被简单地当作一个空的已检测对象序列来处理。

2.2. 人脸识别API

FaceDetector代表一个针对图像中的人脸进行识别的底层加速平台组件。创建时可以选择一个FaceDetectorOptions的Dictionary对象作为入参。它提供了一个单独的 detect()方法操作ImageBitmapSource对象，并返回Promise对象。如果检测到§ 1.1 图形识别的场景中提及的用例，则该方法必须拒绝该Promise对象；否则，它可能会向DetectedFace序列推入一个新任务，这样会消耗操作系统或平台资源去依序处理该Promise，每个任务由一个boundingBox所包含并界定。

dictionary FaceDetectorOptions {
  unsigned short maxDetectedFaces;
  boolean fastMode;
};

maxDetectedFaces, of type unsigned short: 当前场景中已识别的人脸数的最大值。
fastMode, of type boolean: 提示User Agent（UA）尝试以速度优先（于精确度）的模式，通过更小的比例尺（更靠近目标图形）或寻找更大的目标图形的办法进行识别。

[Exposed=(Window,Worker), Constructor(optional FaceDetectorOptions faceDetectorOptions)]
interface FaceDetector {
  Promise<sequence<DetectedFace>> detect(ImageBitmapSource image);
};

FaceDetector(optional FaceDetectorOptions faceDetectorOptions): 构建一个新的FaceDetector对象，附带可选项faceDetectorOptions。
detect(): 尝试在ImageBitmapSource 图像中识别人脸，如果识别到人脸，则返回一个DetectedFace序列。

interface DetectedFace {
  [SameObject] readonly attribute DOMRectReadOnly boundingBox;
};

boundingBox, of type DOMRectReadOnly, readonly: 与图像坐标轴对齐的一个矩形，该矩形标示了一个已识别特征的位置和范围。

人脸识别的实现案例有：Android FaceDetector, Apple’s CIFaceFeature 或者 Windows 10 FaceDetector。

Consider adding attributes such as, e.g.:

[SameObject] readonly attribute unsigned long id;
[SameObject] readonly attribute FrozenArray<Landmark>? landmarks;

to DetectedFace.

2.3. 条形码识别API

BarcodeDetector代表一个针对图像中的二维码或条形码进行识别的底层加速平台组件。它提供了一个单独的detect()方法操作ImageBitmapSource对象，并返回Promise对象。如果检测到§ 1.1 图形识别的场景中提及的情况，则该方法必须拒绝该Promise对象；否则，它可能会向DetectedBarcode序列推入一个新任务，这样会消耗操作系统或平台资源去依序处理该Promise。基本上每个任务包含boundingBox和一系列Point2D，甚至可能还有个解码后的DOMString对象rawValue，由它们来确定边界。

[Exposed=(Window,Worker), Constructor()]
interface BarcodeDetector {
  Promise<sequence<DetectedBarcode>> detect(ImageBitmapSource image);
};

detect(ImageBitmapSource image): 尝试在ImageBitmapSource图像中识别条形码。

interface DetectedBarcode {
  [SameObject] readonly attribute DOMRectReadOnly boundingBox;
  [SameObject] readonly attribute DOMString rawValue;
  [SameObject] readonly attribute FrozenArray<Point2D> cornerPoints;
};

boundingBox, of type DOMRectReadOnly, readonly: 与图像坐标轴对齐的一个矩形，该矩形标示了一个已识别特征的位置和范围。
rawValue, of type DOMString, readonly: 从条形码解码得到的DOMString对象，该值可能为多行。
cornerPoints, of type FrozenArray<Point2D>, readonly: 一串已识别条形码的顶点序列（sequence），顺序从左上角开始，以顺时针方向排列。因为现实中透视形变的原因，该序列不一定表示的是正方形。

实现了条形码/二维码识别的示例有：Google Play Services 或者 Apple’s CICRCodeFeature.

2.4. 文本识别API

TextDetector代表一个针对图像中的文本进行识别的底层加速平台组件。它提供了一个单独的detect()方法操作ImageBitmapSource对象，并返回Promise对象。如果检测到§ 1.1 图形识别的场景中提及的情况，则该方法必须拒绝该Promise对象；否则，它可能会向DetectedText序列推入一个新任务，这样会消耗操作系统或平台资源去依序处理该Promise，基本上每个task包含一个rawValue，并由一个boundingBox来确定边界。

[
    Constructor,
    Exposed=(Window,Worker),
] interface TextDetector {
    Promise<sequence<DetectedText>> detect(ImageBitmapSource image);
};

detect(ImageBitmapSource image): 尝试在ImageBitmapSource 图像中识别文本块。.

[
    Constructor,
] interface DetectedText {
    [SameObject] readonly attribute DOMRect boundingBox;
    [SameObject] readonly attribute DOMString rawValue;
};

boundingBox, of type DOMRect, readonly: 与图像坐标轴对齐的一个矩形，该矩形标示了一个已识别特征的位置和范围。
rawValue, of type DOMString, readonly: 从图像中识别到的原始字符串。

实现了文本识别的示例有：Google Play Services, Apple’s CIDetector 或者 Windows 10 OCR API.

3. 示例

以下示例的微调或扩展版本，以及更多示例请参考这个codepen集合。

3.1. 图形识别器的平台支持

以下的示例同样可以在这个codepen中找到微调的版本。

if (window.FaceDetector == undefined) {
  console.error('Face Detection not supported on this platform');
}
if (window.BarcodeDetector == undefined) {
  console.error('Barcode Detection not supported on this platform');
}
if (window.TextDetector == undefined) {
  console.error('Text Detection not supported on this platform');
}

3.2. 人脸识别

以下的示例同样可以在这个codepen(或者这个有边界框覆盖的图像示例中)找到。

let faceDetector = new FaceDetector({fastMode: true, maxDetectedFaces: 1});
// Assuming |theImage| is e.g. a <img> content, or a Blob.

faceDetector.detect(theImage)
.then(detectedFaces => {
  for (const face of detectedFaces) {
    console.log(' Face @ (${face.boundingBox.x}, ${face.boundingBox.y}),' +
        ' size ${face.boundingBox.width}x${face.boundingBox.height}');
  }
}).catch(() => {
  console.error("Face Detection failed, boo.");
})

3.3. 条形码识别

以下的示例同样可以在这个这个codepen(或者这个覆盖了边界框的图像示例中)找到。

let barcodeDetector = new BarcodeDetector();
// Assuming |theImage| is e.g. a <img> content, or a Blob.

barcodeDetector.detect(theImage)
.then(detectedCodes => {
  for (const barcode of detectedCodes) {
    console.log(' Barcode ${barcode.rawValue}' +
        ' @ (${barcode.boundingBox.x}, ${barcode.boundingBox.y}) with size' +
        ' ${barcode.boundingBox.width}x${barcode.boundingBox.height}');
  }
}).catch(() => {
  console.error("Barcode Detection failed, boo.");
})

3.4. 文本识别

以下的示例同样可以在这个codepen (或者这个集成了视频捕捉功能的示例)找到。

let textDetector = new TextDetector();
// Assuming |theImage| is e.g. a <img> content, or a Blob.

textDetector.detect(theImage)
.then(detectedTextBlocks => {
  for (const textBlock of detectedTextBlocks) {
    console.log(
        'text @ (${textBlock.boundingBox.x}, ${textBlock.boundingBox.y}), ' +
        'size ${textBlock.boundingBox.width}x${textBlock.boundingBox.height}');
  }
}).catch(() => {
  console.error("Text Detection failed, boo.");
})

加速的图形识别

Draft Community Group Report, 18 August 2020

Abstract

Status of this document

1. 简介

1.1. 图形识别的场景

2. 图形识别API

2.1. 用于识别的图像源

2.2. 人脸识别API

2.3. 条形码识别API

2.4. 文本识别API

3. 示例

3.1. 图形识别器的平台支持

3.2. 人脸识别

3.3. 条形码识别

3.4. 文本识别

Conformance

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

IDL Index