And with Postgresql version 13.3 on another server configuration with db_pool_max at 300 in docker-compose.yml and in the Postgres server's postgresql.conf at 400. The Alfresco Server is Ubuntu 22.04 with 98 GB RAM and 24 cores, while the Postgres server is also Ubuntu 22.04 with 26 GB RAM and 16 cores.
During a few times a day (sporadically) the alfresco container reaches 1600% CPU processing (through docker stats). The alfresco container's memory configuration in docker-compose.yml is 68 GB and even so it reaches these high CPU processing numbers.
At these moments, Postgresql begins to generate time outs in active processes running and thus the entire environment needs to be rebooted (Postgresql and Alfresco).
We have already advised development teams to no longer use CMIS and only REST API to send, search, update and delete nodes in Alfresco. There are thousands of GET/PUT requests that are received by Alfresco. All applications use only 1 user to connect to Alfresco.
In Postgresql we monitor in particular a query that executes as follows:
select assoc.id as id, parentNode.id as parentNodeId, parentNode.version as parentNodeVersion, parentStore.protocol as parentNodeProtocol, parentStore.identifier as parentNodeIdentifier, parentNode.uuid as parentNodeUuid, childNode.id as childNodeId, childNode.version as childNodeVersion, childStore.protocol as childNodeProtocol, childStore.identifier as childNodeIdentifier, childNode.uuid as childNodeUuid, assoc.type_qname_id as type_qname_id, assoc.child_node_name_crc as child_node_name_crc, assoc.child_node_name as child_node_name, assoc.qname_ns_id as qname_ns_id, assoc.qname_localname as qname_localname, assoc.is_primary as is_primary, assoc.assoc_index as assoc_index from alf_child_assoc assoc join alf_node parentNode on (parentNode.id = assoc.parent_node_id) join alf_store parentStore on (parentStore.id = parentNode.store_id) join alf_node childNode on (childNode.id = assoc.child_node_id) left join alf_store childStore on (childStore.id = childNode.store_id) where parentNode.id = 988
All processes in Postgresql basically run this query and it returns thousands of documents at a time (sometimes millions). This query gets stuck running for a long time in Postgresql with many deadlocks and when the processes start to turn red it shows an error: ERROR: relation "alf_bootstrap_lock" does not exist at character 15
Has anyone come across this type of scenario? If you need more data, I will provide it without any problems.
There are a ticket wiht #1018 in this link github.com/Alfresco/acs-deployment/issues/1018 that CHUNT answer about put the ticket here.
I would like to add that the query above, we discovered using APM Search to monitor what it is triggered by Solr (by API Solr / GET), could anyone help me with this? Or give me some direction?