Does Alfresco provide checksum for managed content to aid detecting duplicates?

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
kbala
Active Member

Does Alfresco provide checksum for managed content to aid detecting duplicates?

I am looking to save a checksum for managed content. We have multiple sources that save images to alfresco and unfortunately, we end up housing a lot of duplicates. Looking into ways that will alleviate the problem.

5 Replies
jpotts
Advanced II

Re: Does Alfresco provide checksum for managed content to aid detecting duplicates?

Not out of the box, but you can add it easily. I did something similar for another client. You can create a behavior that computes a hash on the content stream every time it is updated, and store that hash as a property on the content. Then, finding duplicates is just a matter of running a search for all documents that have that same hash value.

I think I saw that version 6.x added something related to checksums but I have not investigated to see if it is similar to what I describe above.

mehe
Senior Member II

Re: Does Alfresco provide checksum for managed content to aid detecting duplicates?

Jeff is right when he mentions „something related“ in the newer versions you have document fingerprinting.

Document Fingerprints | Alfresco Documentation 

You can also find related documents with fingerprinting. 

I saw it first in a tech Talk live - and - again - an excellent article from Andy Hind‌ about document fingerprints.

https://community.alfresco.com/people/andy1/blog/2017/05/12/document-fingerprints 

Maybe this helps...

kbala
Active Member

Re: Does Alfresco provide checksum for managed content to aid detecting duplicates?

Thank you. Before we implemented something ourselves that will save the hashes, I wanted to see if Alfresco had something to offer before we tried to reinvent the wheel. Looks like we have v5.2 and I am not sure if an upgrade is pending and we might not be able to use the Document Fingerprint option yet.

kbala
Active Member

Re: Does Alfresco provide checksum for managed content to aid detecting duplicates?

I am not finding much documentation on fingerprinting of image and other media content. Any idea if this has been designed to cater toward text content?

andy1
Senior Member

Re: Does Alfresco provide checksum for managed content to aid detecting duplicates?

Hi

Fingerprinting was designed for text only.

If you can turn your image into a text representation than you can use it.

Andy