Important
As of 2024-03-15, this repo is now archived, for reference only. For Clearly Local users, please see:
- New demo site at https://cl-tools.deno.dev/pages/easylt-ocr/samples/ocr-sample-editor-poc.html (requires Clearly Local Toolkit account)
- New repo at https://github.com/clearlylocal/easylt-ocr (requires Clearly Local GitHub access)
import { OcrSdk } from 'path/to/OcrSdk.ts'
// credentials obtained from https://cloud.ocrsdk.com/Account/Register
const ocrSdk = new OcrSdk(
applicationId: Deno.env.get('ABBYY_APPLICATION_ID')!, // e.g. 7ea53f47-8bbc-477b-b17c-989a3184c363
password: Deno.env.get('ABBYY_PASSWORD')!, // e.g. n6WL0rCFlhU9bDXDri6AQEZV
serviceUrl: Deno.env.get('ABBYY_SERVICE_URL')!, // e.g. https://cloud-eu.ocrsdk.com/
)
const { txt } = await ocrSdk.ocr(await Deno.readFile('input.jpg'), {
languages: ['English'],
exportFormats: ['txt'],
})
await Deno.writeFile('output.txt', new Uint8Array(await txt.arrayBuffer()))
To run the CLI, note that the relevant ABBYY_APPLICATION_ID
, ABBYY_PASSWORD
, and ABBYY_SERVICE_URL
must be available as environment variables.
# view help
deno task cli --help
# `convert` command, specifying output formats (default "txt")
deno task cli convert path/to/image.jpg -o txt -o xml
# `html`/`json` commands
deno task cli html path/to/image.jpg
deno task cli json path/to/image.jpg
# specify languages (default "English")
deno task cli json path/to/image.jpg -l ChinesePRC -l English
src/
core/
- OcrSdk.ts: The
OcrSdk
class, with various methods for interacting with the ABBYY Cloud OCR API. Loosely based on ABBYY's sample JS code, but with the following changes:- Rewritten in modern JS/TS with ES6 classes etc.
- Promise-based API instead of callbacks
- Calls the v2 (JSON) Cloud OCR API, not the v1 (XML) API
- Methods now deal with raw binary data rather than file paths (file reading/writing is left up to calling code)
- Exposes a single
ocr
method to OCR an image and return the output file binary in the requested format - Zero external dependencies
- imageMap.ts: The
imageMap
function, for converting XML output to an image map that can be rendered in HTML etc. - prettifyXml.ts: The
prettifyXml
function, for pretty-printing XML output while preserving significant whitespace - types.ts: TypeScript types and lists of values for interacting with
OcrSdk
- OcrSdk.ts: The
cli/
functions/
- convertImage.ts: Use
OcrSdk
'socr
method to get text and XML files of the OCRed content - jsonImageMap.ts: Get simplified JSON content from XML using
imageMap.ts
- htmlImageMap.ts: Get XML content, then convert it to a searchable HTML image map based on the text data and coordinates of each line
- convertImage.ts: Use
- main.ts: CLI app to run the various functions
samples/
- ocr-sample.jpg: An example input image file
- ocr-sample-result.txt: Text file result of running
convertImage.ts
onocr-sample.jpg
- ocr-sample-result.xml: XML result of running
convertImage.ts
onocr-sample.jpg
- ocr-sample-image-map.html: HTML image map generated by running
htmlImageMap.ts
onocr-sample.jpg
- ocr-sample-image-map.json: JSON file generated by running
jsonImageMap.ts
onocr-sample.jpg