I appreciate the updated info about using the FINGERPRINT function in AFTS queries, but in the process of testing the search, I came up with some questions about how things should work. Specifically:
What is the default overlap percentage if I don't specify it as the second value? When I run the AFTS query 'FINGERPRINT:52763', where 52763 is the DBID, I get 487 results. When I supply any overlap percentage ranging from 'FINGERPRINT:52763_1' to 'FINGERPRINT:52763_99', I get the same 1 result, which is the document I am using as the source in the search.
I assume that the FINGERPRINT's minhash is generated when the doc (mostly PDFs in our case) is created or updated. Should I ALWAYS receive one row (the source document) for the FINGERPRINT query if the text is Tika extractable?
I have two PDFs that are almost identical that aren't showing up in each others FINGERPRINT queries, and in fact, return 0 rows. Does that mean there was a problem extracting the text for the minhash? If so, how do I query if the minhash is empty?
I am using Alfresco Community 5.2 (201707), and Alfresco Search Services 1.1.
According to an older alfresco jira ticket (https://issues.alfresco.com/jira/browse/SEARCH-2) the min_hash value is generated and written to the content store and the solr index at content ingestion time. The source for FingerPrintComponent.java in the Search Services Github looks like it is getting the value from the SOLR field 'MINHASH', but I am not getting any results when i try to query solr for MINHASH.
I am able to get a query using the FINGERPRINT value if I rebuild the entire core, but content that comes in after that does not show in a query.
For instance, I have a document that has been around since the first Solr rebuild, and I can get an afts search to work using the queries
TEXT:"[unique text in document]"
for the full text search and
If I add a new document, i can see that the fulltext indexer picks it up, there are no errors, and I can see it when I do a afts query
TEXT:"[unique text in new document]"
When I run the FINGERPRINT query, I get no results. I can wait minutes, hours or days, but it will only get returned if I rebuild the Solr database. A FIX or REINDEX of the single document doesn't seem to help.
Is there something that needs to be done to have it get the calculation done? How can I tell what the value is, or if it's generated?
The Michael Suzuki presentation is the only video I can see out there that goes over this.