FINGERPRINT questions in Solr 6

cancel
Showing results for 
Search instead for 
Did you mean: 
dbiggins
Active Member II

FINGERPRINT questions in Solr 6

I appreciate the updated info about using the FINGERPRINT function in AFTS queries, but in the process of testing the search, I came up with some questions about how things should work.  Specifically:

  •  What is the default overlap percentage if I don't specify it as the second value?  When I run the AFTS query 'FINGERPRINT:52763', where 52763 is the DBID, I get 487 results.  When I supply any overlap percentage ranging from 'FINGERPRINT:52763_1' to 'FINGERPRINT:52763_99', I get the same 1 result, which is the document I am using as the source in the search.
  • I assume that the FINGERPRINT's minhash is generated when the doc (mostly PDFs in our case) is created or updated.  Should I ALWAYS receive one row (the source document) for the FINGERPRINT query if the text is Tika extractable?
  • I have two PDFs that are almost identical that aren't showing up in each others FINGERPRINT queries, and in fact, return 0 rows.  Does that mean there was a problem extracting the text for the minhash?  If so, how do I query if the minhash is empty?

I am using Alfresco Community 5.2 (201707), and Alfresco Search Services 1.1.

Thanks everyone!

6 Replies
dbiggins
Active Member II

Re: FINGERPRINT questions in Solr 6

According to an older alfresco jira ticket (https://issues.alfresco.com/jira/browse/SEARCH-2) the min_hash value is generated and written to the content store and the solr index at content ingestion time. The source for FingerPrintComponent.java in the Search Services Github looks like it is getting the value from the SOLR field 'MINHASH', but I am not getting any results when i try to query solr for MINHASH.

dbiggins
Active Member II

Re: FINGERPRINT questions in Solr 6

I am able to get a query using the FINGERPRINT value if I rebuild the entire core, but content that comes in after that does not show in a query.

For instance, I have a document that has been around since the first Solr rebuild, and I can get an afts search to work using the queries

TEXT:"[unique text in document]"

for the full text search and

FINGERPRINT:[dbid]

If I add a new document, i can see that the fulltext indexer picks it up, there are no errors, and I can see it when I do a afts query

TEXT:"[unique text in new document]"

When I run the FINGERPRINT query, I get no results.  I can wait minutes, hours or days, but it will only get returned if I rebuild the Solr database.  A FIX or REINDEX of the single document doesn't seem to help.

Is there something that needs to be done to have it get the calculation done?  How can I tell what the value is, or if it's generated?

The Michael Suzuki‌ presentation is the only video I can see out there that goes over this.

msuzuki
Active Member

Re: FINGERPRINT questions in Solr 6

Hi Dan, if possible could you raise this in a JIRA ticket so that we can capture all the information and try to reproduce the issue. 

dbiggins
Active Member II

Re: FINGERPRINT questions in Solr 6

Will do.  Can you give me a solr query that should return the minhash value?  And am I right in assuming that it should be generated when content is created/ updated?

msuzuki
Active Member

Re: FINGERPRINT questions in Solr 6

Try http://localhost:8983/solr/alfresco/afts?fl=[cached]%20MINHASH&indent=on&q=cm_name:bana*&wt=json 

Make sure you replace the port to match your solr instance and replace cm_name:bana* with one of your files.

dbiggins
Active Member II

Re: FINGERPRINT questions in Solr 6

My apologies for the delay, but I have created a JIRA ticket:

[ALF-22032] FINGERPRINT not getting consistently recreated after metadata change - Alfresco JIRA 

Unfortunately, I could not create it on the search project, so I'm not sure that you would see it. 

I am seeing this behavior with Alfresco search services 1.1.1 and 1.2