I have encountered an Alfresco 7.2.1 containerized instance in operation as a service layer for an application. The Alfresco UIs are not used, but instead its API is used by the custom application to push, pull and search documents.
The system holds millions of documents. When a document is pushed to Alfresco, it is pushed as a child of a common parent workspace node. For the purpose of organization, let's say node 27 has millions of direct children defined in ALF_CHILD_ASSOC.
When such a new child document is added, Solr then tries seems to initiate a bulk download of all children of node 27. Eventually this call times out, yielding the errors-
SolrInformationServer
Bulk indexing failed,​ do one node at a time. See the stacktrace below for further details.
SolrInformationServer Unable to get nodes metadata from repository using fromNodeId=27,​ toNodeId=27,​ nodeIds=null,​ fromTxId=null,​ toTxId=null,​ txIds=null. See the stacktrace below for further details.
At this point Solr seems to resort to pulling every single child document of 27, one by one, to rationalize with its index.
The loading on the content server during this operation increases during this entire operation, and in this particular sample hits upwards of 50GB of memory consumption (obviously having massively increased the allocation to the container). Memory exhaustion leads to endless garbage collection and churn, etc.
Is this normal and expected behaviour? I understand that Solr would need to understand a new child in the context of a parent, but the massive loading seems suboptimal. Is the use of Alfresco in this manner (with a single level) a problem, and would a file hierarchy prevent this?
Any assistance on this would be hugely valued.
Not sure how exactly system is setup at your environment and how much resources you have configured. But in general you should not/never keep that many nodes in single folder. This definitely has impact. Ideally only 2K-3K nodes are adviced in one folder at same level and rather you should implement bucketing.
Take a look at this thread as well: https://hub.alfresco.com/t5/alfresco-content-services-forum/is-there-still-a-limit-on-the-amount-of-...
When you add a new child node in a folder, the new node (content, metadata and acl) and its parent node (folder update timestamp metadata and acl) would be re-indexed.
Ask for and offer help to other Alfresco Content Services Users and members of the Alfresco team.
Related links:
By using this site, you are agreeing to allow us to collect and use cookies as outlined in Alfresco’s Cookie Statement and Terms of Use (and you have a legitimate interest in Alfresco and our products, authorizing us to contact you in such methods). If you are not ok with these terms, please do not use this website.