Limitations of Lucene Results

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
vanhoeck
Member II

Limitations of Lucene Results

Hi,

I was testing the abilities of Lucene on an Alfresco server with about 10k files. When I run the query as Admin, I get 1000 results (intentionally didn't raise that limit). In the next step, I've added a new user (Homer) and I gave Homer one file in his home folder. When I ran the query as Homer, he didn't get any file back.

I'm guessing Lucene stops at 1000 files and then starts to check the permissions one by one. Since there are 10000 files on Alfresco and only 1000 files in the result, he just never gets to the file I was searching for. This implies that a query which has 1000+ results in total, but only 50 for the current user, might not get all 50 answers back. Is there any way to correct this without changing the repository.properties-file? And is there any way of knowing whether all results were returned? I know I can change system.acl.maxPermissionChecks to 10000, but that wouldn't solve the problem once the number of files goes up (to 100000 for example).

Thanks,
Tim

PS: the lines in the repository.properties-file I was talking about.

# Properties to limit resources spent on individual searches
#
# The maximum time spent pruning results
system.acl.maxPermissionCheckTimeMillis=10000
# The maximum number of results to perform permission checks against
system.acl.maxPermissionChecks=1000
10 Replies
francescof
Member II

Re: Limitations of Lucene Results

Your problem is most likely not caused by "system.acl.maxPermissionChecks" being set to "1000". You should raise your "system.acl.maxPermissionCheckTimeMillis" to the longest time you could wait for search results. It seems that your first test returned 1000 results in less than 10 seconds and then stopped because "maxPermissionChecks" was hit. Your second test probably stopped after hitting "maxPermissionCheckTimeMillis" without returning results because it didn't have enough time to go past the document visible to the current user.

In other words: with default settings Lucene filtering stops after 1000 checks or after 10 seconds (which one comes first). The only real solution to this problem is speeding up permission checks, since getting raw results out of Lucene indexes is already really fast. A temporary workaround for you could be setting a very high "maxPermissionCheckTimeMillis" value (and narrow down your search criteria as much as possible to reduce the raw resultset size).

If you are building a custom solution using Alfresco Foundation API and don't mind disabling permission checks on results you may try:

         
searchParameters.setLimitBy(LimitBy.NUMBER_OF_PERMISSION_EVALUATIONS);
searchParameters.setLimit(0);

This disables every permission check and returns the whole raw result set.
vanhoeck
Member II

Re: Limitations of Lucene Results

Your problem is most likely not caused by "system.acl.maxPermissionChecks" being set to "1000". You should raise your "system.acl.maxPermissionCheckTimeMillis" to the longest time you could wait for search results. It seems that your first test returned 1000 results in less than 10 seconds and then stopped because "maxPermissionChecks" was hit. Your second test probably stopped after hitting "maxPermissionCheckTimeMillis" without returning results because it didn't have enough time to go past the document visible to the current user.

In other words: with default settings Lucene filtering stops after 1000 checks or after 10 seconds (which one comes first). The only real solution to this problem is speeding up permission checks, since getting raw results out of Lucene indexes is already really fast. A temporary workaround for you could be setting a very high "maxPermissionCheckTimeMillis" value (and narrow down your search criteria as much as possible to reduce the raw resultset size).

If you are building a custom solution using Alfresco Foundation API and don't mind disabling permission checks on results you may try:

         
searchParameters.setLimitBy(LimitBy.NUMBER_OF_PERMISSION_EVALUATIONS);
searchParameters.setLimit(0);

This disables every permission check and returns the whole raw result set.


Thanks for your answer. I changed the time to 70000 and even 200000, but the one file did not appear. And even if it did work, I would still like to keep the time to search limited for growing numbers. Is there any way of knowing that Lucene has hit it's limits (some kind of overflow-check)? In that case, I can just return an error, asking for a more specific query.
francescof
Member II

Re: Limitations of Lucene Results

When hitting the lucene configured limits Alfresco's ACLEntryAfterInvocationProvider sets LimitBy.NUMBER_OF_PERMISSION_EVALUATIONS in result set metadata:



ResultSet resultSet = searchService.query(…);

LimitBy limitBy = resultSet.getResultSetMetaData().getLimitedBy();

if (LimitBy.NUMBER_OF_PERMISSION_EVALUATIONS.equals(limitBy)) {
   /* Search hit lucene configured limits… return the error here. */
   …
}
vanhoeck
Member II

Re: Limitations of Lucene Results

Thanks!
ddanninger
Member II

Re: Limitations of Lucene Results

Is this also possible to handle that without using the Alfresco Foundation API?

with the javascript api / or the SOAP services or some general parameters in alfresco?
nitinagarwal198
Member II

Re: Limitations of Lucene Results

How can I increase lucene result limit without modifying code?
mrogers
Active Member II

Re: Limitations of Lucene Results

Switch to Alfresco 4 and use Solr.
nitinagarwal198
Member II

Re: Limitations of Lucene Results

Thanks for your reply.

I am using alfresco 3.2 community version and I don't want to upgrade it. So, can you please tell me how can I increase lucene result limit. Is there any configuration file for this?

Switch to Alfresco 4 and use Solr.
mrogers
Active Member II

Re: Limitations of Lucene Results

See the first post in this thread.   On 3.2 I think that's all you can do.