Solr 4 indexing lag

cancel
Showing results for 
Search instead for 
Did you mean: 
joe_l3
Active Member II

Solr 4 indexing lag

Hello, 

has anyone experienced Solr 4 indexing lagging with huge content store ?

I am facing a problem with Solr and the NRT (NearRealTime) indexing. Basically, in my environment Solr takes too long to sync indexes with the DB data. Documents are searchable only after 8-10 minutes.

Here is my stack:

  • Alfresco Community 5.2 - 1 Server - 12 vCPU - 40 GB Ram - JVM Heap 20 GB
  • Solr 4 - 1 Server - 12 vCPU - 30 GB Ram - I/O throughput 1843.93 MB/s - JVM Heap 18 GB
  • Mysql 5.7 - 1 Server - 12 vCPU - 20 GB Ram

Sizing and settings worth mentioning:

  • Size on disk of content repository: 5 TB
  • Size on disk of Solr indexes: 300 GB
  • Num. Docs on Solr: 140 Mln
  • Content indexing disabled
  • Solr suggester disabled
  • Alfresco tracking every 8 secs
  • 11 indexing threads for each tracking transaction

 

2 Replies
angelborroy
Alfresco Employee

Re: Solr 4 indexing lag

Initially, it appears that the database might be the bottleneck. Do you have any metrics on the performance of the database queries?

Hyland Developer Evangelist
joe_l3
Active Member II

Re: Solr 4 indexing lag

Hi Angel, thank you for your response.

On DB side, I enabled the slow-query logging and observed the environment for about 10-15 minutes. Apparently all looks fine:

  • 15-25 jdbc connections (in average)
  • no slow queries logged (with duration greater than 1 minute)

I think Solr takes too long to index even a small commit. I noticed that index folder have a lot of 2 GB files:

## /solr4/index/workspace/SpacesStore/index
-rw-r--r-- 1 alfresco alfresco 2.2G Mar 10 2023 _256j_Lucene41_0.tim
-rw-r--r-- 1 alfresco alfresco 2.2G Mar 10 2023 _2gzk_Lucene41_0.tim
-rw-r--r-- 1 alfresco alfresco 2.1G Mar 10 2023 _37iw_Lucene41_0.tim
-rw-r--r-- 1 alfresco alfresco 2.1G Mar 10 2023 _2tvr_Lucene41_0.tim
-rw-r--r-- 1 alfresco alfresco 2.0G Mar 9 2023 _1s66_Lucene41_0.tim
....
-rw-r--r-- 1 alfresco alfresco 1.2M Jul 4 10:07 _ifkj.nvd
-rw-r--r-- 1 alfresco alfresco 6.9M Jul 4 10:07 _ifkj_Lucene410_0.dvd
-rw-r--r-- 1 alfresco alfresco 3.6M Jul 4 10:06 _ifkj_Lucene41_0.doc

In addition thers's another folder named "content" with small gz files and a lot of numbered sub folders that include many other gz files. That folder looks very heavy as well, and I'm unable to list all files within a reasonable time...even the command "ls -l" executed from terminal takes to long to respond 

## solr4/content/_DEFAULT_/db
drwxrwxr-x 2 solr solr 264K Jul  6 17:09 1962
drwxrwxr-x 2 solr solr 264K Jul  6 14:34 1963
drwxrwxr-x 2 solr solr 264K Jul  6 15:10 2086
drwxrwxr-x 2 solr solr 260K Jul 15 08:33 1105
drwxrwxr-x 2 solr solr 260K Jul 10 10:54 1106
....

It looks like solr is always doing something with those huge files and that takes quite a long time. The courious thing is that searches take 3-7 secs (acceptable for 15 items paginated queries on a huge repository) but indexing is 7-10 minutes behind the DB (system clock of both servers are synced)