Hey guys!
I am unable to index large pdf files.
Version Alfresco community 6.1.1
Ubuntu Linux 18.04
See the error message of file catalina.out:
2020-10-01 17:03:28,779 WARN [content.metadata.AbstractMappingMetadataExtracter] [http-nio-8080-exec-41] Metadata extraction rejected:
Extracter: org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter@758471b1
Reason: Max doc size exceeded 10.0 MB
2020-10-01 17:03:29,193 WARN [content.metadata.AbstractMappingMetadataExtracter] [http-nio-8080-exec-28] Metadata extraction rejected:
Extracter: org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter@758471b1
Reason: Max doc size exceeded 10.0 MB
Read the documentation on the website
https://docs.alfresco.com/6.1/references/dev-extension-points-content-transformer.html
I added in alfresco-global.properties
content.transformer.PdfBox.priority = 110
content.transformer.PdfBox.extensions.pdf.txt.priority = 50
content.transformer.PdfBox.extensions.pdf.txt.maxSourceSizeKBytes = 25600
However, it still didn't work.
Can you help please?
With best regards,
Cross-posting: https://hub.alfresco.com/t5/alfresco-content-services-forum/increase-max-file-size-that-solr-indexes...
Hi angelborroy,
I had seen that documentation.
I applied the parameters below, in the alfresco-global.properties
I am restart Alfresco service.
It still didn't work.
Can you help?
Thanks a lot.
content.transformer.default.timeoutMs=180000
content.transformer.default.txt.*.maxSourceSizeKBytes=1048576
content.transformer.JodConverter.maxSourceSizeKBytes=102400
log4j.logger.org.alfresco.repo.content.transform.TransformerDebug=DEBUG
content.metadataExtracter.pdf.maxDocumentSizeMB=1000
content.metadataExtracter.default.timeoutMs=3625000
content.transformer.PdfBox.priority=110
content.transformer.PdfBox.extensions.pdf.txt.priority=50
content.transformer.PdfBox.extensions.pdf.txt.maxSourceSizeKBytes=25600
content.transformer.json2html.priority=30
content.transformer.json2html.extensions.json.html.supported=true
content.transformer.json2html.extensions.json.html.priority=30
Well, not really cross-posting as the OP is different. But the answer in the other thread is definitely spot on for a similar issue with transformers. What is not mentioned in the other thread is that the transformer config is also documented.
But in this case we are talking about metadata extractors, and these have separately configured limits. In fact, the PdfBox extractor is about the only one that has a configured limit via the global property content.metadataExtracter.pdf.maxDocumentSizeMB
Ask for and offer help to other Alfresco Content Services Users and members of the Alfresco team.
Related links:
By using this site, you are agreeing to allow us to collect and use cookies as outlined in Alfresco’s Cookie Statement and Terms of Use (and you have a legitimate interest in Alfresco and our products, authorizing us to contact you in such methods). If you are not ok with these terms, please do not use this website.