searchService returns same nodeRef twice (duplicate index in solr)

cancel
Showing results for 
Search instead for 
Did you mean: 
afaust
Master

Re: searchService returns same nodeRef twice (duplicate index in solr)

There is no difference between Enterprise and Community Edition regarding the approach of using a separate core (on same system or a separate SOLR does not matter either). Actually, Community Edition is way more flexible here due to the SOLR licensing for Enterprise.


The conversion via JODConverter is not "better" per se, e.g. it is not faster in any way. The only improvement it brings is that JODConverter can be used to utilise parallel instances of LibreOffice and helps with LibreOffice process health by restarting the processes automatically.

Setting Alfresco in 100% read-only mode is impossible unless you use a DB user with only read-access privileges. There are various code pieces during startup that overrule any read-only setting configured via alfresco-global.properties (e.g. the default transaction mode which you can set). And I assume those functions will fail if you use a database user with read-only access. But it is possible to have a 98% read-only mode Alfresco that is shielded from any user requests that supports only SOLR. A couple of my customers are doing that.

In Community Edition you'd either have to use a 3rd-party clustering module to ensure its caches are consistent or disable the core caches for nodes to make sure that you always read the consistent state from the database.

mehe
Senior Member II

Re: searchService returns same nodeRef twice (duplicate index in solr)

I thought the JOD converter is much faster, because you can use parallel instances of libreoffice conversions (as long as you have CPU cores, normally I use 4 to 6 instances on different ports if there many Office conversions to do). Since the Indexer is no more single threaded, it can use also the parallel instances to complex-convert to text, so I thought this would be better than the single libre office thread on community. Am I missing something or did I misunderstand the whole thing?   

vincent-kali
Established Member

Re: searchService returns same nodeRef twice (duplicate index in solr)

Many thanks for all your advises and comments.

Axel, when you say "using a separate core on same system" do you mean running two separate solr cores running in parrallele and both connected to a single alfresco instance ? Is it possible ? I've no clue how to do that...

The easiest way (but not the shortest one) for me would be to clone the full system as Martin says...

The link to your TMQ session looks very helpful, I'll check that !

afaust
Master

Re: searchService returns same nodeRef twice (duplicate index in solr)

I am just saying that the transformation via JODConverter is not faster when comparing single-process to single-process. If you have the resources to parallelize JODConverter will of course be more efficient overall.

afaust
Master

Re: searchService returns same nodeRef twice (duplicate index in solr)

Yes, I do mean running separate cores in parallel. Since a core is made up by the configuration folders in solrHome that containing a core.properties file, you can simply just duplicate one of the existing folders (e.g. workspace-SpacesStore or alfresco - depending on how they are called in your system), give it a distinct name and also configure its solrcore.properties to use a distinct storage location for its index. Next time you start SOLR, the new core config folder will be picked up and that core will start tracking Alfresco as per its configuration.

douglascrp
Advanced II

Re: searchService returns same nodeRef twice (duplicate index in solr)

You can use JOD converter on Community now.

Check this out dgcloud / alfresco-remote-jodconverter — Bitbucket 

mehe
Senior Member II

Re: searchService returns same nodeRef twice (duplicate index in solr)

Hi Douglas,

Cool project - looks like you are involved :-)

Thank you for the link, I'll give a try in the nearest future. I was looking for something like that for a long time.


cu, Martin

douglascrp
Advanced II

Re: searchService returns same nodeRef twice (duplicate index in solr)

No, I am not involved in the project.

All I did was to test it, and it works.

vincent-kali
Established Member

Re: searchService returns same nodeRef twice (duplicate index in solr)

OK I'll test the method you mentionned, and potentially put solr on a new server for better performances.

BTW, I confirm that some duplicate index in solr are automatically fixed (a query that return duplicate DBID day X will return single node a day after). Does it make sense for you ? (We're running massive bulk loading on this Platform).

thanks,

vincent

andy1
Senior Member

Re: searchService returns same nodeRef twice (duplicate index in solr)

Hi

The SOLR index can fix itself for many issue without reindexing everything.

localhost:8080/solr/admin/cores?action=FIX&wt=json

It should fix any duplicates, stuff that is missing, etc.

You can also reindex nodes that match a query - or just do them one at a time.

As ‌ has said, there is no reason you can not have more than one solr index built from alfresco.

With community you can define one index to use. The second one you are building will be ignored - it will add some extra load. Once you are done you just need to flip over the configuration and use the new index. There are no helpful admin screens to do this in community and you will have to stop and restart to pick up the property changes.

If we can nail the route cause of anything like this it will be at the top of the fix list !

It really helps everyone if you can describe what you think the cause may be and raise it in ALF.

In general the fraction of deleted nodes in the index is not an issue. The background merge operations in lucene consider this along with other stuff when they decide which segments to merge. Index optimisation is not required as at was years ago and you will in fact throw away some segment level caches. Lucene improved support for lots of segments quite some time ago. Yes a few things scale with doc count - not enough to worry about.

For index rebuild time it depends what you measure. In SOLR 4 and 6 metadata is indexed ahead of content. SOLR caches the docs it adds to the repository for a number of reasons - one is to avoid content transformation at rebuild. Sharing the content is not good as two indexes may both try to write to the cache - you would have to copy it -  I will give this some more thought. It would be easy enough to have one to use the cache read only for example.

Andy