How to speed up solr6 indexing + search service 2.0 + alfresco 7.1 (Need 21 hours to index)

cancel
Showing results for 
Search instead for 
Did you mean: 
mtgdavidchow
Partner

How to speed up solr6 indexing + search service 2.0 + alfresco 7.1 (Need 21 hours to index)

We are using the following community version setup

Alfresco 7.1
Search Services 2.0.2
solr-spec 6.6.5

With the following spacestore.
alfresco
numDocs:31212220
maxDoc:37005238

archive
numDocs:21257292
maxDoc:21445273

 

Our questions are

1. To fully index the above files, we checked it need 21 hours to complete, how to speed it up? We are nearly using default solr6 setting which may be not for production purpose
2. Any good way to check if the indexing completed? The above 21 hours is checked by us regular to see if the numDocs unchanged
3. We found that there are some snapshot folder gerenated in solr6 and keep adding. Is there a way to do a clean up on that?

 

Thank you

3 Replies
cesarista
Customer

Re: How to speed up solr6 indexing + search service 2.0 + alfresco 7.1 (Need 21 hours to index)

Hi, AFAIK:

1. IMO it is not the worst reindexing time I've seen. This time depends of many things (resources, documents - number and mimetypes, parametrization). For speeding, you may adjust several things including the JVM resources of the SOLR machine (if you have one dedicated for indexing), the number of documents per batch you are indexing, the cron for querying the database for transactions (solrcore.properties). In addition, in Alfresco 7.x, there exist some additional indexes in database that may improve the indexing process. Other possibilities may reduce the index, and maybe the time for full processing (indexing only metadata, disabling automatic metadata extraction, not using cross locale / exact term queries, not using suggest feature, not using fingerprints..).

2. The indexing is completed when in Alfresco Admin Console (with OOTB Support Tools in Community) the Search Service has 0 transactions to index (in both cores). 

3. The snapshot folder that you refer is probably a backup that is daily done during the night. It is useful for restoring indices in a backup procedure. It may occupy several Gb too (be careful with the disk), so it is a good idea to point to a proper directory. In 7.1 the path is configured in solrcore.properties. You may also want to keep a smaller number of backups (cause 3 is the default value). For disabling this backup, you may configure the cron property for SOLR backups (in alfresco-global.properties) in a future date, in 2029 for example.

Kind regards.

--C.

mtgdavidchow
Partner

Re: How to speed up solr6 indexing + search service 2.0 + alfresco 7.1 (Need 21 hours to index)

Thank you @cesarista 

For #2, any link for how to install it? I am using Windows OS

For #3, I can't find any related setting in alfresco-global.properties. Can you point me? Also, in my previous testing, the snapshot at least last for 6 days (and then I delete all of them manually for rebuilding). So, so how I thought the snapshot is not cleaned. Is there any config. to set to limit the number of backups (such as 3 you mentioned?)

cesarista
Customer

Re: How to speed up solr6 indexing + search service 2.0 + alfresco 7.1 (Need 21 hours to index)

Hi:

 

For #2: You may get OOTB package (AMP) from docker-installer and to install it as any other standard AMP.

 

For #3: 

In alfresco-global.properties you may set something like for preserving a number of SOLR backups:

 

solr.backup.alfresco.numberToKeep=3

solr.backup.archive.numberToKeep=3

 

Regards.

--C.