With the following spacestore. alfresco numDocs:31212220 maxDoc:37005238
archive numDocs:21257292 maxDoc:21445273
Our questions are
1. To fully index the above files, we checked it need 21 hours to complete, how to speed it up? We are nearly using default solr6 setting which may be not for production purpose 2. Any good way to check if the indexing completed? The above 21 hours is checked by us regular to see if the numDocs unchanged 3. We found that there are some snapshot folder gerenated in solr6 and keep adding. Is there a way to do a clean up on that?
Re: How to speed up solr6 indexing + search service 2.0 + alfresco 7.1 (Need 21 hours to index)
1. IMO it is not the worst reindexing time I've seen. This time depends of many things (resources, documents - number and mimetypes, parametrization). For speeding, you may adjust several things including the JVM resources of the SOLR machine (if you have one dedicated for indexing), the number of documents per batch you are indexing, the cron for querying the database for transactions (solrcore.properties). In addition, in Alfresco 7.x, there exist some additional indexes in database that may improve the indexing process. Other possibilities may reduce the index, and maybe the time for full processing (indexing only metadata, disabling automatic metadata extraction, not using cross locale / exact term queries, not using suggest feature, not using fingerprints..).
2. The indexing is completed when in Alfresco Admin Console (with OOTB Support Tools in Community) the Search Service has 0 transactions to index (in both cores).
3. The snapshot folder that you refer is probably a backup that is daily done during the night. It is useful for restoring indices in a backup procedure. It may occupy several Gb too (be careful with the disk), so it is a good idea to point to a proper directory. In 7.1 the path is configured in solrcore.properties. You may also want to keep a smaller number of backups (cause 3 is the default value). For disabling this backup, you may configure the cron property for SOLR backups (in alfresco-global.properties) in a future date, in 2029 for example.
For #2, any link for how to install it? I am using Windows OS
For #3, I can't find any related setting in alfresco-global.properties. Can you point me? Also, in my previous testing, the snapshot at least last for 6 days (and then I delete all of them manually for rebuilding). So, so how I thought the snapshot is not cleaned. Is there any config. to set to limit the number of backups (such as 3 you mentioned?)