Option to identify file corruption

cancel
Showing results for 
Search instead for 
Did you mean: 
Bill_Tim
Member II

Option to identify file corruption

Is there any option to identify file corruption ?

  • During file upload, is there any possiblity to perform any file corruption check and allow or block the user action ?

 

  • File corruption can also happen during the life time of documents in the repository. Is there any option to perform file corruption check and get a report to take necessary course correciton ?
1 Reply
afaust
Master

Re: Option to identify file corruption

For both points there is no functionality out of the box, but using Alfresco behaviours and background processes (i.e. cron jobs based on Quartz library), something like this could be implemented. For the file upload, you would require that the client somehow submit a verifiable checksum as a metadata property, and then one could verify that in a custom OnContentUpdatePolicy policy, throwing an exception if the verification fails, which would roll back the current transaction and result in the uploaded content being deleted (unless a WORM storage is used). That same checksum could then be stored as part of the document metadata (requires a custom content model), and used in a regularly running Quartz job to re-verify the file contents.

Typically though, I would expect the kind of long-term file corruption detection / check be done in the storage system itself and be kept out of Alfresco. Chances are that if content is that important to consider file corruption, a professional storage solution would be employed which already includes checksum and even correction capabilities.

That then only leaves the upload scenario. And for this one may not want to have the validation happen inside of Alfresco, but rather at the client side. I.e. the client uploads the content, keeps the pre-calculated checksum for itself, and asks Alfesco to provide a checksum of the file after upload for verification. As there is no standard for this yet (e.g. https://datatracker.ietf.org/doc/draft-ietf-httpbis-digest-headers/ is currently only a draft), there is obviously no support yet in Alfresco, but that does not mean one could not implement a custom upload web script / API that supports Digest + Wants-Digest HTTP headers.