alfresco-simple-ocr error on macOS

cancel
Showing results for 
Search instead for 
Did you mean: 
Active Member

alfresco-simple-ocr error on macOS

Jump to solution

Hello,

I have installed the addon alfresco-simple-ocr by keensoft and all its dependencies (ocrmypdf, tesseract, imagemagick). When I tried to OCR a pdf it throws me this error

Exception in thread "defaultAsyncAction1" java.lang.RuntimeException: java.lang.RuntimeException: org.alfresco.service.cmr.repository.ContentIOException: 01190024 Failed to perform OCR transformation:
Execution result:
   os:         Mac OS X
   command:    /usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l spa+eng+fra --output-type pdf /Users/erickbrenes/Documents/dev/alfresco526/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_7618353995588703757.pdf /Users/erickbrenes/Documents/dev/alfresco526/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_7618353995588703757_ocr.pdf
   succeeded:  false
   exit code:  1
   out:
   err:        Traceback (most recent call last):
  File "/usr/local/bin/ocrmypdf", line 5, in <module>
    from ocrmypdf.__main__ import run
  File "/usr/local/Cellar/ocrmypdf/9.5.0/libexec/lib/python3.7/site-packages/ocrmypdf/__init__.py", line 18, in <module>

	at es.keensoft.alfresco.ocr.OCRExtractAction.executeImplInternal(OCRExtractAction.java:196)
	at es.keensoft.alfresco.ocr.OCRExtractAction.access$400(OCRExtractAction.java:39)
	at es.keensoft.alfresco.ocr.OCRExtractAction$1.execute(OCRExtractAction.java:177)
	at es.keensoft.alfresco.ocr.OCRExtractAction$1.execute(OCRExtractAction.java:174)
	at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:464)
	at es.keensoft.alfresco.ocr.OCRExtractAction.executeInNewTransaction(OCRExtractAction.java:182)
	at es.keensoft.alfresco.ocr.OCRExtractAction.access$300(OCRExtractAction.java:39)
	at es.keensoft.alfresco.ocr.OCRExtractAction$ExtractOCRTask$1$1.doWork(OCRExtractAction.java:159)
	at es.keensoft.alfresco.ocr.OCRExtractAction$ExtractOCRTask$1$1.doWork(OCRExtractAction.java:156)
	at org.alfresco.repo.tenant.TenantUtil.runAsWork(TenantUtil.java:126)
	at org.alfresco.repo.tenant.TenantUtil.runAsTenant(TenantUtil.java:95)
	at es.keensoft.alfresco.ocr.OCRExtractAction$ExtractOCRTask$1.doWork(OCRExtractAction.java:155)
	at org.alfresco.repo.security.authentication.AuthenticationUtil.runAs(AuthenticationUtil.java:588)
	at es.keensoft.alfresco.ocr.OCRExtractAction$ExtractOCRTask.run(OCRExtractAction.java:152)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: org.alfresco.service.cmr.repository.ContentIOException: 01190024 Failed to perform OCR transformation:
Execution result:
   os:         Mac OS X
   command:    /usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l spa+eng+fra --output-type pdf /Users/erickbrenes/Documents/dev/alfresco526/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_7618353995588703757.pdf /Users/erickbrenes/Documents/dev/alfresco526/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_7618353995588703757_ocr.pdf
   succeeded:  false
   exit code:  1
   out:
   err:        Traceback (most recent call last):
  File "/usr/local/bin/ocrmypdf", line 5, in <module>
    from ocrmypdf.__main__ import run
  File "/usr/local/Cellar/ocrmypdf/9.5.0/libexec/lib/python3.7/site-packages/ocrmypdf/__init__.py", line 18, in <module>

	at es.keensoft.alfresco.ocr.OCRTransformWorker.transform(OCRTransformWorker.java:86)
	at es.keensoft.alfresco.ocr.OCRExtractAction.executeImplInternal(OCRExtractAction.java:194)
	... 16 more
Caused by: org.alfresco.service.cmr.repository.ContentIOException: 01190024 Failed to perform OCR transformation:
Execution result:
   os:         Mac OS X
   command:    /usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l spa+eng+fra --output-type pdf /Users/erickbrenes/Documents/dev/alfresco526/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_7618353995588703757.pdf /Users/erickbrenes/Documents/dev/alfresco526/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_7618353995588703757_ocr.pdf
   succeeded:  false
   exit code:  1
   out:
   err:        Traceback (most recent call last):
  File "/usr/local/bin/ocrmypdf", line 5, in <module>
    from ocrmypdf.__main__ import run
  File "/usr/local/Cellar/ocrmypdf/9.5.0/libexec/lib/python3.7/site-packages/ocrmypdf/__init__.py", line 18, in <module>

	at es.keensoft.alfresco.ocr.OCRTransformWorker.transform(OCRTransformWorker.java:79)
	... 17 more

If I tried to run the next command manually in my terminal it works without any issue

/usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l spa+eng+fra --output-type pdf /Users/erickbrenes/Documents/dev/alfresco526/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_7618353995588703757.pdf /Users/erickbrenes/Documents/dev/alfresco526/alfresco/tomcat/temp/Alfresco/OCRTransformWorker_source_7618353995588703757_ocr.pdf

does this addon works on macOS?

The log above point to this file

/usr/local/Cellar/ocrmypdf/9.5.0/libexec/lib/python3.7/site-packages/ocrmypdf/__init__.py

this is the file content

# © 2017 James R. Barlow: github.com/jbarlow83
#
# This file is part of OCRmyPDF.
#
# OCRmyPDF is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# OCRmyPDF is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with OCRmyPDF.  If not, see <http://www.gnu.org/licenses/>.

from . import helpers, hocrtransform, leptonica, pdfa, pdfinfo
from ._version import PROGRAM_NAME, __version__
from .api import Verbosity, configure_logging, ocr
from .exceptions import (
    BadArgsError,
    DpiError,
    EncryptedPdfError,
    ExitCode,
    ExitCodeException,
    InputFileError,
    MissingDependencyError,
    OutputFileAccessError,
    PdfMergeFailedError,
    PriorOcrFoundError,
    SubprocessOutputError,
    TesseractConfigError,
    UnsupportedImageFormatError,
)

Thanks

1 Solution

Accepted Solutions
Highlighted
Active Member

Re: alfresco-simple-ocr error on macOS

Jump to solution

I have fixed this issue, I had to add the correct image magick properties in alfresco-global.properties

 

### ImageMagick Config ###
img.root=/usr/local/Cellar/imagemagick/7.0.9-23
# ----> I had this property wrong, 'img.dyn' <---- img.dyn=${img.root}/lib img.exe=${img.root}/bin/convert img.gslib=/usr/local/Cellar/ghostscript/9.50/lib #img.coders=${img.root}/modules/coders #img.config=${img.root}/config #GS executable ghostscript.exe=gs #Tesseract executable tesseract.exe=tesseract

# OCRmyPDF

# running 'which ocrmypdf' returns '/usr/local/bin/ocrmypdf', I think this value could be used as well
ocr.command=/usr/local/Cellar/ocrmypdf/9.5.0/bin/ocrmypdf
ocr.output.verbose=true
ocr.output.file.prefix.command=
ocr.extra.commands=--verbose 1 --force-ocr -l spa+eng+fra --output-type pdf
ocr.server.os=linux

 

View solution in original post

3 Replies
Highlighted

Re: alfresco-simple-ocr error on macOS

Jump to solution

I'd suggest you to use the dockerized version that can be produced with:

https://github.com/Alfresco/alfresco-docker-installer

 

Software Engineer in Alfresco Search Team.
Highlighted
Active Member

Re: alfresco-simple-ocr error on macOS

Jump to solution

Hello Angel Borroy,

I'm using ACS 5.2.6. Enterprise. 

Highlighted
Active Member

Re: alfresco-simple-ocr error on macOS

Jump to solution

I have fixed this issue, I had to add the correct image magick properties in alfresco-global.properties

 

### ImageMagick Config ###
img.root=/usr/local/Cellar/imagemagick/7.0.9-23
# ----> I had this property wrong, 'img.dyn' <---- img.dyn=${img.root}/lib img.exe=${img.root}/bin/convert img.gslib=/usr/local/Cellar/ghostscript/9.50/lib #img.coders=${img.root}/modules/coders #img.config=${img.root}/config #GS executable ghostscript.exe=gs #Tesseract executable tesseract.exe=tesseract

# OCRmyPDF

# running 'which ocrmypdf' returns '/usr/local/bin/ocrmypdf', I think this value could be used as well
ocr.command=/usr/local/Cellar/ocrmypdf/9.5.0/bin/ocrmypdf
ocr.output.verbose=true
ocr.output.file.prefix.command=
ocr.extra.commands=--verbose 1 --force-ocr -l spa+eng+fra --output-type pdf
ocr.server.os=linux

 

View solution in original post