Can anyone point me in the direction of any docs describing how the fingerprint / minihash works with Alfresco 5.2, either through Share or REST API? I am assuming we can use this for finding similar docs ('find more like this'), and that we would have to provide the hash of the content to be compared against, plus some threshold value for what 'close' means, but it's not clear to me how this is set up or used.
Using the search REST API Alfresco Content Services REST API Explorer you can search for/include the "FINGERPRINT" field.
You can add a similarity percentage value to the desired fingerprint with a "_" .
So: get a documents fingerprint value through the search query, add for instance _50 to the value and search for this... (see alfresco tech talk live 103 near the end s demo)
To determine the similarity between already found documents, you could calculate a string-distance, like the Levenshtein distance, between the Fingerprint values. Low distance means more similarity...
For more information on the topic we have added https://community.alfresco.com/people/andy1/blog/2017/05/12/document-fingerprints, which covers in depth the topic.
Hope you find this helpful.
Ask for and offer help to other Alfresco Content Services Users and members of the Alfresco team.
Related links:
By using this site, you are agreeing to allow us to collect and use cookies as outlined in Alfresco’s Cookie Statement and Terms of Use (and you have a legitimate interest in Alfresco and our products, authorizing us to contact you in such methods). If you are not ok with these terms, please do not use this website.