SearchOptimisation

cancel
Showing results for 
Search instead for 
Did you mean: 

SearchOptimisation

resplin
Intermediate
0 0 1,285

Obsolete Pages{{Obsolete}}

The official documentation is at: http://docs.alfresco.com




Search Read Evaluation Performance Improvement


Introduction

Search query performance is dis-proportionaly
affected by (read) permission evaluation on nodes returned by a query
(compared to the actual search), especially so for larger repositories. This enhancement improves read
permission evaluation performance by optimising the read permission
evaluation code and using more caching where possible.

Approach

A new method PermissionService.hasReadPermission provides the
optimised read permission evaluation. It is used by the Alfresco Acegi security filters to filter search results by read
permission. It performs the following checks in order for each node in the
query results:

i) can the current user read the node (does the user have explicit READ permission on the node). This is accomplished by looking
up the node ref from the lucene index, then looking up the nodes acl from the nodes cache, then looking up the read authorities for that acl from the aclReaders cache and then checking whether the authenticated user's
authorisations include any of the authorities with read permission. If this information is
cached, the calculation of the result will be very quick.
ii) is the current user the admin user
iii) is the current user the owner of the node. This is a lookup in the nodeOwner cache. If this information is cached, the calculation of the result will be very quick.

The 'nodes' and 'nodeOwner' caches already exist. A new cache 'aclReaders' (which maps between node acls and
authorities with read permissions on the acl) is introduced. A
transaction-specific cache is used to store the authorisations for the authenticated user (an expensive
operation when looping through the results of a search).

This optimisation can be turned off and on with the 'system.readpermissions.optimise' property.

Caveats

a) In order to achieve the speed improvement, the only supported dynamic authorities are those present in the default permissions model. The presence of dynamic authorities other than those in the default permissions model will force use of the standard permissions evaluation.
b) Again, in order to achieve the speed improvement, the permission check is restricted to explicit read permission - combinations of lower-level read permissions will not be found e.g. given group A with READ_CONTENT and group B with READ_PROPERTIES, a user in both groups should have read permission but will not be found.

Results

The setup is a store with 1000000 nodes that have inherited read
permission for user x and 10 nodes that have explicit read permission
for user Y.

The test comprises running a 'TYPE:\'cm:content\'' search 4 times in a
row against a store for a newly-started repository.

i) The nodes cache (txn and shared) has max entries set to 2001000 (bi-directional cache,
so multiply number of nodes by 2). The nodeOwner cache (txn and
shared) has max entries set to 1001000. The aclReaders cache has max
entries set to 10000 (more than enough in this case given that
permissions are inherited).

Optimised

User Y, 10 nodes in 81.312237
User Y, 10 nodes in 11.170863
User Y, 10 nodes in 11.195379
User Y, 10 nodes in 11.189759

Not Optimised

User Y, 10 nodes in 103.71217
User Y, 10 nodes in 104.995179
User Y, 10 nodes in 106.320423
User Y, 10 nodes in 106.424857

ii) The nodes cache (txn and shared) has max entries set to 2001000 (bi-directional cache,
so multiply number of nodes by 2). The nodeProperties, nodeAspects and
nodeOwner caches (txn and
shared) have max entries set to 1001000. The aclReaders cache has max
entries set to 10000 (more than enough in this case given that
permissions are inherited).

Not Optimised

User Y, 10 nodes in 97.32361
User Y, 10 nodes in 25.924495
User Y, 10 nodes in 25.829163
User Y, 10 nodes in 25.990601

Optimised

User Y, 10 nodes in 81.277027
User Y, 10 nodes in 11.420287
User Y, 10 nodes in 11.214065
User Y, 10 nodes in 11.299918

iii) The nodes, nodeAspects, nodeProperties and nodeOwner caches are
not big enough to hold all mappings.

Optimised

User Y, 10 nodes in 80.149073
User Y, 10 nodes in 74.073088

Not Optimised

User Y, 10 nodes in 94.331461
User Y, 10 nodes in 86.959984
User Y, 10 nodes in 87.951549
User Y, 10 nodes in 87.681294

Conclusion

The optimised version is faster with cold caches. For warm caches, the
speed improvement depends on the sizes of the caches. The optimised
version benefits from nodes, nodeOwner and aclReaders caches being
sized to store all mappings that will be needed for searches, but does not require large aspects and properties caches. The non-optimised version requires the
aspects and properties caches to be sized to store all mappings to
come close, but even then is slower than the optimised version.

For a cold aclReaders cache, having lots of non-inherited permissions (and hence more acls)
will slow down the permissions evaluation as it builds up the aclReaders cache.

In order to achieve the best performance, ensure that your nodes,
nodeOwner and aclReaders caches are sized appropriately. For the aclReaders cache
make sure the transactional cache is large too, otherwise it will overflow and the
shared cache will be cleared.

Note that larger result sets will generally result in more of an improvement in performance than smaller result sets (depending on appropriately sized caches). This is because permission evaluation essentially loops over the result set checking read permissions for each result, so that the time savings from the faster read permissions evaluation code amortizes over the complete result set.

Search