We are new to Alfresco Community.
When we scan paper documents they are automatically OCR'ed and we are given the option to change the filename to our chosen naming convention. We then drag and drop the file into Alfresco. This works very well!
When our users have digital files they want to drag and drop into Alfresco, we would like automate the following process:
1) Determining if the file needs to be OCR'ed.
2) If it needs to be OCR'ed - do it.
3) Allow us to verify that the filename matches our naming convention. If not, give us the option to change it.
Is there an Alfresco plugin that will do this for us? If not, what software do you suggest we use to do this?
We have tried OCR software and have found it to work well. The problem is that it takes several steps (and some computer savvy) to do this manually before dropping it into Alfresco. If they forget to OCR it first, the documents are not searchable. We would like this process to be as simple (and foolproof) as possible.
What do you suggest?
Hi,
In my case, something similar to this behaviour, docs were scanned, ocr'ed and saved in a particular folder on filesystem structure (shared folder). Over that folder I had an application checking to extract metadata, change name of files and afterwards upload -using CMIS- the file to Alfresco.
In Alfresco this documents were classified using content rules and scripts depending on filename and metadata.
So, maybe you can try to develop an external application to do all funcionality you need, before upload the file to Alfresco using CMIS.
Regards,
clv
...there are so many possibilities.. :-)
do you use only one scanner or a bunch of?
do you upload to a specific "inBox" or everywhere in Alfresco?
What OCR Software/Scanner are you using? Maybe it has a kind of "scripting" possibility or an api to add some code?
The simplest approach for me is:
- use a scanner with OCR-facility
- upload the scanned documents to a "inBox" folder (using a "post-scan" script
- in Alfresco: check naming convention and "is there text to extract" via "created" rule in "inBox"
- in Alfresco: move document to a folder, depending on naming convention (or raise an exception/move to an error folder in rule, if naming isn't valid or no text could be extracted
Thank you! That is very helpful. I will look into CMIS, scripts and creating rules. That sounds like it may just be the ticket.
Paper that we scan goes smoothly into Alfresco.
It is the files, on users computers, that they drop into the system that are causing problems. They assume that since it is a PDF it has been OCR'ed. This may or may not be the case. If we get a bunch of unsearchable documents into our system, users will not be able to find them later and the value of the EDMS breaks down.
Short of threatening them, how can we set it up so that only OCR'ed documents go into the system?
If your OCR step can also set a property, that's probably easiest. Then you can have a rule check for the presence of that property.
Alternatively, the rule could do a transform to text. If the result is empty you know it wasn't OCR'd so you move the document to an exception folder or send an email or something.
Ask for and offer help to other Alfresco Content Services Users and members of the Alfresco team.
Related links:
By using this site, you are agreeing to allow us to collect and use cookies as outlined in Alfresco’s Cookie Statement and Terms of Use (and you have a legitimate interest in Alfresco and our products, authorizing us to contact you in such methods). If you are not ok with these terms, please do not use this website.