I run a dockerized Alfresco with following components:
Solved! Go to Solution.
Thank you all for your time.
I'd like to close this issue because of this:
CIFS doesn't work as I would like and I found that it will be removed:
https://hub.alfresco.com/t5/alfresco-content-services-blog/architecture-changes-for-alfresco-content...
I just switched to Alfresco 6.1 by using this https://github.com/Alfresco/alfresco-docker-installer
Then I configured the good old FTP interface. FTP just works like a charm, I just added this in JAVA_OPTS of docker-compose.yml:
-Dftp.enabled=true
-Dftp.port=2121
added port to alfresco service:
edited alfresco/Dockerfile and added this line at the bottom:
EXPOSE 2121
I ahd to use ACTIVE communication mode in the client in order to connect to Alfresco.
The files received by Alfresco are readable AND metadat is searchable with the share interface. All right.
Have you confirmed that the PDFs are not image PDFs? So, are they PDFs wrapped around an image file or was OCR done by the scanner putting the text into the PDF file? Have you also confirmed that if you take the scanner output and manually upload it that it properly goes through as expected?
Hello,
Thank you for your answer.
Yes, the scanner performs OCR and put text into the PDF. This is confirmed because, as i wrote, the very same file becomes searchable after a simple download (into workstation) and then immediately upload into Alfresco by using share interface.
Looks like the PDF with text content doesn't fire the metadata extracter when sent by the CIFS interface from the scanner, but everything is okay when the same file is uploaded from Alfresco via share by the user...
Not sure it's related to the strange extracter behavior I submitted, but I have following debug logs during Aafresco startup:
alfresco_1 | 2019-10-21 15:24:41,219 DEBUG [content.metadata.AbstractMappingMetadataExtracter] [localhost-startStop-1] Loaded mapping properties from resource: alfresco/metadata/TikaAutoMetadataExtracter.properties
alfresco_1 | 2019-10-21 15:24:41,222 DEBUG [content.metadata.AbstractMappingMetadataExtracter] [localhost-startStop-1] No explicit embed mapping properties found at: alfresco/metadata/TikaAutoMetadataExtracter.embed.properties, assuming reverse of extract mapping
alfresco_1
Any idea ?
I'm still looking into this. Just to make completely sure - you do the exact same thing manually (same folder, same repository, same user, etc.) and it works?
Hello,
Thanks for your reply. Yes.
I just checked again:
first step:
PDF file sent to Alfresco (to folder "Nemerisation") via CIFS, -->no DEBUG metadata Extractor log and the PDF is not searchable.
The fujitsu N7100 scanner logs into Alfresco as user "xxx"
second step:
User "xxx", not the scanner but a real user is connected to Alfresco and performs following actions:
from the folder"Numerisation" user downloads the file to Windows 10 Worstation via share (firefox browser)
then immeditaly after, user uploads the same file to Alfresco via share (firefox browser)
---> this fires metadata extraxter according DEBUG log, and the PDF besomes searchable:
alfresco_1 | 2019-10-22 14:08:30,582 DEBUG [content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-38] Starting metadata extraction:
alfresco_1 | reader: ContentAccessor[ contentUrl=store://2019/10/22/14/8/99524f4b-2ace-4202-8bd8-b83933f7edf9.bin, mimetype=application/pdf, size=405868, encoding=UTF-8, locale=fr]
alfresco_1 | extracter: org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter@3839fcb4
alfresco_1 | 2019-10-22 14:08:30,584 DEBUG [content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-38] Concurrent extractions : 0
alfresco_1 | 2019-10-22 14:08:30,585 DEBUG [content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-38] New extraction accepted. Concurrent extractions : 1
alfresco_1 | 2019-10-22 14:08:30,601 DEBUG [content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-38] Extraction finalized. Remaining concurrent extraction : 0
alfresco_1 | 2019-10-22 14:08:30,602 DEBUG [content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-38] Converted extracted raw values to system values:
alfresco_1 | Raw Properties: {date=2019-10-22T15:06:30Z, pdfDFVersion=1.3, TIKA_PARSER_PARSE_SHAPES=false, xmp:CreatorTool=N7100 1.0, comments=null, dc:subject=null, meta:creation-date=2019-10-22T15:06:30Z, created=2019-10-22T15:06:30Z, author=null, MetadataDate=D:20191022160630+01'00', xmpTPg:NPages=1, Creation-Date=2019-10-22T15:06:30Z, dcterms:created=2019-10-22T15:06:30Z, Last-Modified=2019-10-22T15:06:30Z, dcterms:modified=2019-10-22T15:06:30Z, dc:format=application/pdf; version=1.3, title=null, Last-Save-Date=2019-10-22T15:06:30Z, meta:save-date=2019-10-22T15:06:30Z, pdf:encrypted=false, producer=PFU PDF Library 1.0, modified=2019-10-22T15:06:30Z, Content-Type=application/pdf}
alfresco_1 | System Properties: {{http://www.alfresco.org/model/content/1.0}created=2019-10-22T15:06:30Z, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=null}
alfresco_1 | 2019-10-22 14:08:30,603 DEBUG [content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-38] Extracted Metadata from ContentAccessor[ contentUrl=store://2019/10/22/14/8/99524f4b-2ace-4202-8bd8-b83933f7edf9.bin, mimetype=application/pdf, size=405868, encoding=UTF-8, locale=fr]
alfresco_1 | Found: {date=2019-10-22T15:06:30Z, pdfDFVersion=1.3, TIKA_PARSER_PARSE_SHAPES=false, xmp:CreatorTool=N7100 1.0, comments=null, dc:subject=null, meta:creation-date=2019-10-22T15:06:30Z, created=2019-10-22T15:06:30Z, author=null, MetadataDate=D:20191022160630+01'00', xmpTPg:NPages=1, Creation-Date=2019-10-22T15:06:30Z, dcterms:created=2019-10-22T15:06:30Z, Last-Modified=2019-10-22T15:06:30Z, dcterms:modified=2019-10-22T15:06:30Z, dc:format=application/pdf; version=1.3, title=null, Last-Save-Date=2019-10-22T15:06:30Z, meta:save-date=2019-10-22T15:06:30Z, pdf:encrypted=false, producer=PFU PDF Library 1.0, modified=2019-10-22T15:06:30Z, Content-Type=application/pdf}
alfresco_1 | Mapped and Accepted: {{http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=null}
alfresco_1 | 2019-10-22 14:08:30,605 DEBUG [content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-38] Completed metadata extraction:
alfresco_1 | reader: ContentAccessor[ contentUrl=store://2019/10/22/14/8/99524f4b-2ace-4202-8bd8-b83933f7edf9.bin, mimetype=application/pdf, size=405868, encoding=UTF-8, locale=fr]
alfresco_1 | extracter: org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter@3839fcb4
alfresco_1 | changed: {{http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=null}
thank you for your support. :-)
Another question the PDF is showing up when added via CIFS, correct? I am wondering if it is a permission or user issue. is CIFS working in other situations for other operations?
Hello,
A network folder is configured in the network scanner. This network folder is a folder within Alfresco. The scanner sends the pdf via the CIFS interface to Alfresco. So we can say that the document is pushed by the scanner to Alfresco.
Alfresco receives the document, then the document can be read by the user by using the share interface but the metadata doesn't show up when the user performs a search (by using the share interface).
Surprisingly, if the user downloads the same pdf file to the workstation (with share interface) and then uploads the same file (with share interface), the metadata shows up during search.
Thank you for your help.
Best regards.
Thank you all for your time.
I'd like to close this issue because of this:
CIFS doesn't work as I would like and I found that it will be removed:
https://hub.alfresco.com/t5/alfresco-content-services-blog/architecture-changes-for-alfresco-content...
I just switched to Alfresco 6.1 by using this https://github.com/Alfresco/alfresco-docker-installer
Then I configured the good old FTP interface. FTP just works like a charm, I just added this in JAVA_OPTS of docker-compose.yml:
-Dftp.enabled=true
-Dftp.port=2121
added port to alfresco service:
edited alfresco/Dockerfile and added this line at the bottom:
EXPOSE 2121
I ahd to use ACTIVE communication mode in the client in order to connect to Alfresco.
The files received by Alfresco are readable AND metadat is searchable with the share interface. All right.
Ask for and offer help to other Alfresco Content Services Users and members of the Alfresco team.
Related links:
By using this site, you are agreeing to allow us to collect and use cookies as outlined in Alfresco’s Cookie Statement and Terms of Use (and you have a legitimate interest in Alfresco and our products, authorizing us to contact you in such methods). If you are not ok with these terms, please do not use this website.