CMIS query performance slow - Solr index improvement?
This problem is experienced with Alfresco 6.0 (could be in Alfresco 5.2 as well). I do not know how much the version matters to this issue.
I have a custom model defined. This mode has a custom type and it has a custom aspect with a set of properties.
We use CMIS API to communicate with the repository. Functionally, everything has been working great.
Now in order to show the information on the document objects (without the binary content), we use the following query through CMIS API. It is clear that the execution of this query through CMIS is taking long to respond.
SELECT doc.*, slo.*
FROM cmis:document AS doc
JOIN slo:documentProperties AS sloalias ON doc.cmisbjectId = sloalias.cmisbjectId
0a813931-e5d4-4a4a-82ad-8ccbbd1c2405 is the id of the folder under which the document would reside
Suppose we remove the calling of this query, the page loads very fast.
With this query executing, in a production system with millions of documents, it is consistently taking 20 seconds to respond.
Please note that, on my (custom) model definition, I have not done anything special with respect to indexing (the properties). So, whatever is happening must be from the default behavior.
I wonder if the improper usage IN_TREE is the cause of slowness. Because I have read somewhere saying, we can write CMIS query in such a way not to go through Solr (indexing), whether it is a good idea or not.
I also wonder if this performance issue has anything to do with other things like default auditing, or default (excessive) logging, etc.
Re: CMIS query performance slow - Solr index improvement?
What is the page size you are requesting for this CMIS query? Have you already set the logger for the SolrQueryHTTPClient class to debug and checked the logs if the long query duration actually originates from the SOLR query call? Do you use dynamic authorities in any of your customisations/extensions, which can indrectly cause some ACL check optimisations to be disbabled?
The use of IN_TREE is not necessarily "improper". It is a valid selector. Though if you only need to find documents directly in that folder, and not in any sub-folders, then IN_TREE could be replaced by IN_FOLDER, which would allow the query to execute via DB instead of SOLR. But bear in mind that executing a query against the DB is not necessarily faster than when executed against SOLR. Performance depends on a lot of factors like DB index / statistics state, data selectivity etc. For larger data sets, it is very likely that SOLR is faster more often than a DB query, since a DB query has to perform ACL checking as a post-filter step, causing additional DB interactions.