Solved: convert scanned pdf into searchable pdf - Alfresco...

anuradha1 · ‎13 Mar 2019

Hi,

I am using alfresco for a long time. I scanned around 100,000 documents & uploaded into alfresco. But suddenly i faced a problem because it can't be read using it's content. If i scanned a document using a scanner having ocr then it can be. But, i don't have ocr on every scanner so i need to integrate OCR module into alfresco. I tried tesseract ocr & simple ocr, but both did not worked.

If anyone knows, plz tell me another way to do this or correct way to integrate tesseract-ocr or simple-ocr
I need to convert all uploaded document into searchable pdf also.

Thanks

angelborroy · ‎21 Mar 2019

You can take this as base:

https://github.com/keensoft/alfresco-simple-ocr/blob/master/docker/pdfsandwich-1.6-centos-7/Dockerfi...

Hyland Developer Evangelist

View solution in original post

sercama · ‎13 Mar 2019

Hi anuradha madhushani

Do you know this addon?

GitHub - keensoft/alfresco-simple-ocr: Simple OCR action for Alfresco

You can configure a rule which it executes the OCR action from this addon.

Regards.

anuradha1 · ‎21 Mar 2019

I tried it. but not working. i am having CentOS 7 repository. I think pdfsandwitch not supported for this repository. OCRmyPDF also not installed i can't find whether its supported to CentOS 7 or not. What can i do??

Thank you.

angelborroy · ‎21 Mar 2019

You can take this as base:

https://github.com/keensoft/alfresco-simple-ocr/blob/master/docker/pdfsandwich-1.6-centos-7/Dockerfi...

Hyland Developer Evangelist

anuradha1 · ‎27 Mar 2019

ok. i am successfully installed pdfsandwitch also. after restarting alfresco, it displays ocr button. once i clicked it, below message loaded but nothing happen.

please help me. I am in a big trouble now

anuradha1 · ‎2 Jun 2019

Thank you for all who helps to me. I did it with simple-ocr for my Linux installation. But now i need to do the same with an alfresco windows installation. Anyone knows a way to do that??

Any help will be appreciated.

Thank you.

Zizou27 · ‎5 Jan 2020

Dear @anuradha1

Have you fixed this problem ? because i still have this issue ... thank you for your help

Zizou27 · ‎5 Jan 2020

dear @anuradha1

Have you fixed this problem ? I still have this issue ... thanks for your help

Zizou27 · ‎8 Jan 2020

Hello @angelborroy

I have this issue ... i'm using Ocrmypdf with alfresco ... Ocrmypdf work well manually using the command ... but when I use it with alfresco 'OCR action' does'nt work ... this is the log :

Caused by: java.lang.RuntimeException: org.alfresco.service.cmr.repository.ContentIOException: 000817996 Failed to perform OCR transformation:
Execution result:
os: Linux
command: /usr/local/bin/ocrmypdf --verbose 1 --force-ocr -l eng /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_4887267237326407155.pdf /opt/alfresco-community/tomcat/temp/Alfresco/OCRTransformWorker_source_4887267237326407155_ocr.pdf
succeeded: false
exit code: 1
out:
err: Traceback (most recent call last):
File "/usr/local/bin/ocrmypdf", line 5, in <module>
from ocrmypdf.__main__ import run
File "/root/.local/lib/python3.6/site-packages/ocrmypdf/__init__.py", line 20, in <module>
from .api import Verbosity
at es.keensoft.alfresco.ocr.OCRExtractAction.executeImplInternal(OCRExtractAction.java:183)
at es.keensoft.alfresco.ocr.OCRExtractAction.access$200(OCRExtractAction.java:38)
at es.keensoft.alfresco.ocr.OCRExtractAction$1.execute(OCRExtractAction.java:164)
at es.keensoft.alfresco.ocr.OCRExtractAction$1.execute(OCRExtractAction.java:161)
at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:464)
at es.keensoft.alfresco.ocr.OCRExtractAction.executeInNewTransaction(OCRExtractAction.java:169)
at es.keensoft.alfresco.ocr.OCRExtractAction.access$100(OCRExtractAction.java:38)
at es.keensoft.alfresco.ocr.OCRExtractAction$ExtractOCRTask.run(OCRExtractAction.java:151)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)

thankx for your help

redouane · ‎9 Apr 2021

J'ai le méme probléme dans Windows ! comment je peux résoudre le probléme ??

convert scanned pdf into searchable pdf - Alfresco Community 5.2

convert scanned pdf into searchable pdf - Alfresco Community 5.2

Re: convert scanned pdf into searchable pdf - Alfresco Community 5.2

Re: convert scanned pdf into searchable pdf - Alfresco Community 5.2

Re: convert scanned pdf into searchable pdf - Alfresco Community 5.2

Re: convert scanned pdf into searchable pdf - Alfresco Community 5.2

Re: convert scanned pdf into searchable pdf - Alfresco Community 5.2

Re: convert scanned pdf into searchable pdf - Alfresco Community 5.2

Re: convert scanned pdf into searchable pdf - Alfresco Community 5.2

Re: convert scanned pdf into searchable pdf - Alfresco Community 5.2

Re: convert scanned pdf into searchable pdf - Alfresco Community 5.2

Re: convert scanned pdf into searchable pdf - Alfresco Community 5.2

convert scanned pdf into searchable pdf - Alfresco Community 5.2

convert scanned pdf into searchable pdf - Alfresco Community 5.2

Re: convert scanned pdf into searchable pdf - Alfresco Community 5.2

Re: convert scanned pdf into searchable pdf - Alfresco Community 5.2

Re: convert scanned pdf into searchable pdf - Alfresco Community 5.2

Re: convert scanned pdf into searchable pdf - Alfresco Community 5.2

Re: convert scanned pdf into searchable pdf - Alfresco Community 5.2

Re: convert scanned pdf into searchable pdf - Alfresco Community 5.2

Re: convert scanned pdf into searchable pdf - Alfresco Community 5.2

Re: convert scanned pdf into searchable pdf - Alfresco Community 5.2

Re: convert scanned pdf into searchable pdf - Alfresco Community 5.2

Re: convert scanned pdf into searchable pdf - Alfresco Community 5.2

We use cookies on this site to enhance your user experience