Indexing and solr search problems for large amount of data

cancel
Showing results for 
Search instead for 
Did you mean: 
nsr
Member II

Indexing and solr search problems for large amount of data

Hello Alfresco forum.

Apologies, but this is a reupload of https://hub.alfresco.com/t5/alfresco-content-services-forum/indexing-and-search-problems-for-large-a..., since it was immediately marked as spam, and I have been unable to get it unmarked.

Context & Versions:
I am working on an Alfresco project with the following versions:
Alfresco: Community - 6.2.1
Alfresco Search Services: 1.2.0
Solr-spec: 6.6.0

We have a fairly complex data model in Alfresco containing millions of nodes. The current summary of Solr for the alfresco core looks like this, but is incomplete in the sense that not all alfresco nodes seem to be indexed:

<?xml version="1.0" encoding="UTF-8"?>
<response>
	<lst name="responseHeader">
		<int name="status">0</int>
		<int name="QTime">268</int>
	</lst>
	<lst name="initFailures"/>
	<lst name="status">
		<lst name="alfresco">
			<str name="name">alfresco</str>
			<str name="instanceDir">/opt/alfresco-search-services/solrhome/alfresco</str>
			<str name="dataDir">/opt/alfresco-search-services/data/alfresco/</str>
			<str name="config">solrconfig.xml</str>
			<str name="schema">schema.xml</str>
			<date name="startTime">2024-05-24T12:35:58.489Z</date>
			<long name="uptime">1446560158</long>
			<lst name="index">
				<int name="numDocs">6638674</int>
				<int name="maxDoc">6674937</int>
				<int name="deletedDocs">36263</int>
				<long name="indexHeapUsageBytes">-1</long>
				<long name="version">180791</long>
				<int name="segmentCount">31</int>
				<bool name="current">false</bool>
				<bool name="hasDeletions">true</bool>
				<str name="directory">org.apache.lucene.store.NRTCachingDirectory:NRTCachingDirectory(MMapDirectory@/opt/alfresco-search-services/data/alfresco/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@2f03fff; maxCacheMB=48.0 maxMergeSizeMB=4.0)</str>
				<str name="segmentsFile">segments_f7a</str>
				<long name="segmentsFileSizeInBytes">2110</long>
				<lst name="userData">
					<str name="commitTimeMSec">1718000350060</str>
				</lst>
				<date name="lastModified">2024-06-10T06:19:10.060Z</date>
				<long name="sizeInBytes">10228264804</long>
				<str name="size">9.53 GB</str>
			</lst>
		</lst>
	</lst>
</response>

 

 

Observed Problem:

The problem we experience is that all Alfresco content is not fully indexed even after multiple tries at running full reindexes (https://docs.alfresco.com/search-services/1.3/admin/monitor/#perform-a-full-reindex-with-solr).
This is shown by the fact that simple solr queries through solr-cmis (LANGUAGE_SOLR_CMIS), expected to return thousands of results, only yield ~200, as well as an endless amount of indexing exceptions in the kubernetes SOLR-pod.

The exceptions include:
1. "Node index failed and skipped for 866 in Tx 23606587 - java.net.SocketTimeoutException: Read timed out"
2.  "Node index failed and skipped for 18554293 in Tx 23620176 - java.lang.NullPointerException"
3.  "SolrInformationServer problem - java.net.SocketTimeoutException: Read timed out"
4.  "Tracking failed for AclTracker - AlfrescoRuntimeException: 05130611 api/solr/aclchangesets return status:503"

See the section "Specific exceptions in Solr" for full stacktraces.


Questions:

It seems like the communication between Alfresco and Solr is timing out during the indexing process. It is, however, not clear to me why that would be the case other than Alfresco-repo dealing with too much traffic, or in essence being unable to handle the sheer amount of data in the system.

The pods themselves have sufficient space claimed for the index through persistent volume claims (essentially infinite), and the cpu and memory requests along with the jvm memory seems sufficiently high as well. 

How should we go about finding out what the root cause of the issue is? 
Is there any configuration of Solr or Alfresco-repo that we need to take a look at to tune the system?

Let me know if you need further information, and I will gladly provide it.

Specific exceptions in Solr:
This section will contain the full stacktaces of the exceptions:

1. 

2024-06-11 17:34:50.951 WARN  (SolrTrackingPool-alfresco-MetadataTracker-6) [   ] o.a.s.SolrInformationServer Node index failed and skipped for 866 in Tx 23606587
java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
	at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
	at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
	at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
	at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
	at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
	at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
	at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
	at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
	at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
	at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
	at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
	at org.alfresco.httpclient.AbstractHttpClient.executeMethod(AbstractHttpClient.java:135)
	at org.alfresco.httpclient.AbstractHttpClient.sendRemoteRequest(AbstractHttpClient.java:111)
	at org.alfresco.httpclient.HttpClientFactory$DefaultHttpClient.sendRequest(HttpClientFactory.java:430)
	at org.alfresco.solr.client.SOLRAPIClient.getNodesMetaData(SOLRAPIClient.java:867)
	at org.alfresco.solr.SolrInformationServer.indexNode(SolrInformationServer.java:1919)
	at org.alfresco.solr.SolrInformationServer.indexNodes(SolrInformationServer.java:2658)
	at org.alfresco.solr.tracker.MetadataTracker$NodeIndexWorkerRunnable.doWork(MetadataTracker.java:882)
	at org.alfresco.solr.tracker.AbstractWorkerRunnable.run(AbstractWorkerRunnable.java:45)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

2.

2024-06-13 07:49:11.081 WARN  (SolrTrackingPool-alfresco-MetadataTracker-3) [   ] o.a.s.SolrInformationServer Node index failed and skipped for 18554293 in Tx 23620176
java.lang.NullPointerException
	at org.alfresco.solr.SolrInformationServer.addContentPropertyToDocUsingCache(SolrInformationServer.java:3249)
	at org.alfresco.solr.SolrInformationServer.addPropertiesToDoc(SolrInformationServer.java:3000)
	at org.alfresco.solr.SolrInformationServer.addToNewDocAndCache(SolrInformationServer.java:2715)
	at org.alfresco.solr.SolrInformationServer.indexNode(SolrInformationServer.java:1975)
	at org.alfresco.solr.SolrInformationServer.indexNodes(SolrInformationServer.java:2658)
	at org.alfresco.solr.tracker.MetadataTracker$NodeIndexWorkerRunnable.doWork(MetadataTracker.java:882)
	at org.alfresco.solr.tracker.AbstractWorkerRunnable.run(AbstractWorkerRunnable.java:45)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

3.

2024-06-11 17:41:00.845 ERROR (SolrTrackingPool-alfresco-MetadataTracker-3) [   ] o.a.s.SolrInformationServer SolrInformationServer problem
java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
	at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
	at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
	at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
	at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
	at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
	at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
	at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
	at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
	at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
	at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
	at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
	at org.alfresco.httpclient.AbstractHttpClient.executeMethod(AbstractHttpClient.java:135)
	at org.alfresco.httpclient.AbstractHttpClient.sendRemoteRequest(AbstractHttpClient.java:111)
	at org.alfresco.httpclient.HttpClientFactory$DefaultHttpClient.sendRequest(HttpClientFactory.java:430)
	at org.alfresco.solr.client.SOLRAPIClient.getNodesMetaData(SOLRAPIClient.java:867)
	at org.alfresco.solr.SolrInformationServer.indexNodes(SolrInformationServer.java:2562)
	at org.alfresco.solr.tracker.MetadataTracker$NodeIndexWorkerRunnable.doWork(MetadataTracker.java:882)
	at org.alfresco.solr.tracker.AbstractWorkerRunnable.run(AbstractWorkerRunnable.java:45)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

4.

2024-06-13 07:33:00.002 ERROR (org.alfresco.solr.AlfrescoCoreAdminHandler@29a5f4e7_Worker-40) [   ] o.a.s.t.AbstractTracker Tracking failed for AclTracker - alfresco
org.alfresco.error.AlfrescoRuntimeException: 05130611 api/solr/aclchangesets return status:503
	at org.alfresco.solr.client.SOLRAPIClient.getAclChangeSets(SOLRAPIClient.java:169)
	at org.alfresco.solr.tracker.AclTracker.getSomeAclChangeSets(AclTracker.java:427)
	at org.alfresco.solr.tracker.AclTracker.trackAclChangeSets(AclTracker.java:655)
	at org.alfresco.solr.tracker.AclTracker.trackRepository(AclTracker.java:307)
	at org.alfresco.solr.tracker.AclTracker.doTrack(AclTracker.java:95)
	at org.alfresco.solr.tracker.AbstractTracker.track(AbstractTracker.java:215)
	at org.alfresco.solr.tracker.TrackerJob.execute(TrackerJob.java:47)
	at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:563)