The following module and engine provide the core features required to bring OCR capability to your Alfresco system. It uses the new Alfresco Transform framework to implement an OCR engine. The OCR engine utilizes several command line tools to extract, convert, clean, deskew, and eventually perform the optical character recognition of a scanned PDF document. Through the configuration of renditions, it can produce a plain text, HOCR, or a PDF document with properly located embedded text. Once the rendition is created, you can use rules or other extension to perform further processing.
Alfresco Platform JAR Module & Alfresco Transform Engine
Alfresco Java Public API & Alfresco Transform Engine Base
ACS Platform: JAR Module (sideloaded) & rendition configuration Alfresco Transform Engine: Spring Boot app or Docker Container
This is an early version of this product. Although it is fully functioning and stable, it strictly focuses on performing the OCR. The module is expected to be greatly expanded with features, based on feedback.