1. Introduction
Photos and images constitute the largest chunk of the Web, and many include recognisable features, such as human faces, QR codes or text. Detecting these features is computationally expensive, but would lead to interesting use cases e.g. face tagging, or web URL redirection. This document deals with text detection whereas the sister document [SHAPE-DETECTION-API] specifies the Face and Barcode detection cases and APIs.
1.1. Text detection use cases
Please see the Readme/Explainer in the repository.
2. Text Detection API
Individual browsers MAY provide a TextDetector to perform text detection in images,
potentially leveraging hardware acceleration or additional dependent libraries.
The availability() method allows developers to check for the availability
of these capabilities and specific language support.
2.1. Image sources for detection
Please refer to Accelerated Shape Detection in Images § 2.1 Image sources for detection
2.2. Text Detection API
TextDetector represents an underlying accelerated platform’s component for detection in images of Latin-1 text as defined in [iso8859-1]. It provides a single detect() operation on an ImageBitmapSource of which the result is a Promise. This method must reject this Promise in the cases detailed in § 2.1 Image sources for detection; otherwise it may queue a task using the OS/Platform resources to resolve the Promise with a sequence of DetectedTexts, each one essentially consisting on a rawValue and delimited by a boundingBox and a series of Point2Ds.
dictionary {TextDetectorOptions required sequence <DOMString >; };languages dictionary {TextDetectorCreateOptions AbortSignal ;signal sequence <DOMString >; }; [languages Exposed =(Window ,Worker ),SecureContext ]interface {TextDetector constructor ();static Promise <Availability >availability (TextDetectorOptions );options static Promise <TextDetector >create (optional TextDetectorCreateOptions = {});options Promise <sequence <DetectedText >>detect (ImageBitmapSource ); };image
TextDetector()-
Detectors may potentially allocate and hold significant resources. Where possible, reuse the same
TextDetectorfor several detections. availability(TextDetectorOptions options)-
Returns a
Promisethat resolves with anAvailabilityobject indicating the overall availability status for the specified options languages for text detection.The returned
Availabilityvalue is determined by the following precedence, applied across all requested languages:- If any requested language is
"unavailable", the method returns"unavailable". - Otherwise, if any requested language is
"downloadable", the method returns"downloadable". - Otherwise, if any requested language is
"downloading", the method returns"downloading". - Otherwise, all requested languages are
"available", and the method returns"available".
This method allows developers to check for specific language support before attempting to create aTextDetectorinstance. - If any requested language is
create(optional TextDetectorCreateOptions options)-
Returns a
Promisethat resolves with a newTextDetectorinstance.This factory method handles the asynchronous initialization of the text detector, including downloading necessary resources. It is recommended to use this asynchronous method over the synchronous constructor to accommodate potential delays from dependency downloads or initialization, ensuring a smoother user experience. detect(ImageBitmapSource image)- Tries to detect text blocks in the
ImageBitmapSourceimage.
2.2.1. DetectedText
dictionary {DetectedText required DOMRectReadOnly boundingBox ;required DOMString rawValue ;required sequence <Point2D >cornerPoints ; };
boundingBox, of type DOMRectReadOnly- A rectangle indicating the position and extent of a detected feature aligned to the image
rawValue, of type DOMString- Raw string detected from the image, where characters are drawn from [iso8859-1].
cornerPoints, of type sequence<Point2D>- A sequence of corner points of the detected feature, in clockwise direction and starting with top-left. This is not necessarily a square due to possible perspective distortions.
3. Examples
This section is non-normative.
3.1. Platform support for a text detector
if ( ! ( 'TextDetector' in window)) { console. error( 'Text Detection not supported on this platform' ); } else { const languages= [ 'en' , 'es' ]; // English and Spanish TextDetector. availability({ languages: languages}). then( availability=> { if ( availability=== 'unavailable' ) { console. log( 'Not all of the requested languages are supported.' ); return ; } if ( availability=== 'downloadable' ) { console. log( 'Languages need to be downloaded first.' ); } else if ( availability=== 'downloading' ) { console. log( 'Languages are currently being downloaded.' ); } else { console. log( 'All requested languages are supported.' ); } // Now you can create a TextDetector with the supported languages. // If the status was 'downloadable' or 'downloading', create() will wait // for the download to finish before resolving. TextDetector. create({ languages: languages}). then( detector=> { // ... use the detector }); }); }
3.2. Text Detection
( async () => { // Assuming |theImage| is e.g. a <img> content, or a Blob. try { // The legacy synchronous constructor is still supported, // but the async create() method is recommended. // let textDetector = new TextDetector(); let textDetector= await TextDetector. create(); const detectedTextBlocks= await textDetector. detect( theImage); for ( const textBlockof detectedTextBlocks) { console. log( `text @ ( ${ textBlock. boundingBox. x} , ${ textBlock. boundingBox. y} ), ` + `size ${ textBlock. boundingBox. width} x ${ textBlock. boundingBox. height} ` ); } } catch ( e) { console. error( "Text Detection failed, boo." , e); } })();