Solr Transaction in Index not DB Issues

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
a554551n
Member II

Solr Transaction in Index not DB Issues

Hello World,

I'm afraid I have spent days trying to figure it out and I'm totally stumped, so I'm reaching out to all of you for ideas and guidance. A solution would be amazing, but I'm open to any suggestions.

I'm new to this project, Alfresco, and Solr as a whole, but have been tasked with "fixing the issue that causes our end users to call us reporting that a person isn't found when it should be". When we tweak that person's permissions in any way (magic happens, then) the person is now found in Alfresco.

This led me to believe that it's an issue in which Solr hiccups and the Eventual Consistency is never actually reached. This theory is supported by the "Count of transactions in the index but not the DB" count in production being >0 every day, a state I can inconsistently reproduce in my test environment when using a Java app and CMIS calls to 'process' records from a DB and create the necessary folders/permissions as indicated by this database. This count seems loosely related to the frequency of our phone calls. Running the action=FIX call lowers that count (rarely to 0), but I'm more interested in finding the cause.

I'm happy to try whatever I can on the test environment, but as I'm new I'll need pretty specific guidance.

Alfresco VM

Alfresco Content Services Home: /home/alfresco/alfresco-one
Alfresco Content Services Edition: Enterprise
Alfresco Content Services Version: 5.2.2
Java Home: /usr/java/jdk1.8.0_91/jre
Java Version: 1.8.0_91
Java VM Vendor: Oracle Corporation
Operating System: Linux
Version: 4.1.12-112.14.13.el7uek.x86_64
Architecture: amd64
Free Memory (GB): 2.54
Maximum Memory (GB): 15.55
Total Memory (GB): 2.97
CPUs: 4

Solr VM:

Search Services  1.2.0  fae4bdb4d5d35b235021c7859bd2ddfc7cbcefe1 - 2018-08-14T22:57:54Z
solr-spec  6.6.0
solr-impl  6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:32:53
lucene-spec  6.6.0
lucene-impl  6.6.0 5c7a7b65d2aa7ce5ec96458315c661a18b320241 - ishan - 2017-05-30 07:29:46

Attached Logs:

*_fresh_start* -> state of the logs after shutting down both Alfresco and Solr, clearing the logs, then starting them both back up

Further questions to help my understanding:
1. What's likely causing this count to rise?
2. Is this count actually indicating an error as it sounds, or am I down the wrong rabbit hole?
3. Any suggestions on what might prevent the count from rising?
4. What does the action=FIX actually do, and why does the count not always reset to 0? (it takes over 30min to run in our production environment, and doesn't reliably reset the count to 0 - even when there is nobody using Solr, so I can't just run it periodically as a 'fix')
5. How would I find more information about the transactions/nodes that are part of the count? I don't know how to take the identified "First transaction in the index but not the DB" and get anything useful from it.
6. Is there a super fast way to obtain the "First transaction in the index but not the DB" (or ideally a list of all of them) so that I can call action=REINDEX (which is very fast) on it/them (this isn't solving the problem, but is helpful to know)?