Hello
I would like to mirror document library files and folder structure to the backup drive every night (so downloading whole document library is not an option). How can I do it?
Alfresco stores files in different structures and with different names so it is impossible to just copy files.
Solved! Go to Solution.
The best backup approach is to (1) dump your database and then (2) backup your entire content store directory.
However, if you want to mirror your document library to an external drive, there are several ways to do it...
1. Your document library is accessible via WebDAV. You could mount it as a drive, then use rsync to copy it to a backup volume. Obviously you will not get the metadata or the version history if you go this route.
2. You could write a script that uses the REST API or CMIS to crawl the document library and write new and updated files to the backup volume.
3. You could use behaviors to track when files are new or updated and then put a message on a queue. In a separate process, have Java code that subscribes to the queue and when it seems an event, fetch the file via a REST API or CMIS and write it to the backup volume.
I've successfully used option 3 for a client. We added UI actions to Share that allow authorized end-users to "flag" a document for backup. It works as described above, but instead of watching for any create/update, the behavior watches for the presence of a "marker" aspect that indicates the file should be archived.
The best backup approach is to (1) dump your database and then (2) backup your entire content store directory.
However, if you want to mirror your document library to an external drive, there are several ways to do it...
1. Your document library is accessible via WebDAV. You could mount it as a drive, then use rsync to copy it to a backup volume. Obviously you will not get the metadata or the version history if you go this route.
2. You could write a script that uses the REST API or CMIS to crawl the document library and write new and updated files to the backup volume.
3. You could use behaviors to track when files are new or updated and then put a message on a queue. In a separate process, have Java code that subscribes to the queue and when it seems an event, fetch the file via a REST API or CMIS and write it to the backup volume.
I've successfully used option 3 for a client. We added UI actions to Share that allow authorized end-users to "flag" a document for backup. It works as described above, but instead of watching for any create/update, the behavior watches for the presence of a "marker" aspect that indicates the file should be archived.
Thank you for such an extensive answer. I tried to implement option nr. 2 - script in Python3 using your cmislib3 (unfortunately, I can find docs only for cmislib :/ ).
It is easy to traverse through directories but I can't download files.
This is the code I use:
content = repo.getObjectByPath("/Sites/test/test.png")
o = open(content.getName(), 'wb')
result = content.getContentStream()
o.write(result.read())
result.close()
o.close()
The downloaded file has a size of 0 bytes... (result is str, not binary so it throws an error, I can change 'wb' to 'w', but then the binary files are invalid)
EDIT. It looks like Python3.8 in incompatible.
EDIT2. In the binding.py you have to replace 1938-19340 lines with 'return io.BytesIO(result.content)'
Hi @upforsin,
Just to clarify - if you make the change in EDIT2 can you now download successfully? If so, is this problem now solved?
Thanks,
Yes, thanks to @jpotts I successfully managed to copy document structure to FTP.
First, I tried option nr. 3 - 3 days of writing Python code, I managed to download files (after modifying CMIS library) but then I couldn't upload some files to FTP (because binary files had chars incompatible with Latin encoding *??).
Then, in 10 minutes I implemented option nr. 1 using WebDAV and lftp. Thank you once again, Jeff!
*Python error: Unresolved “UnicodeEncodeError: 'latin-1' codec can't encode characters in position xxx: ordinal not in range(256)”
Here is the script if anyone were interested: https://pastebin.com/srfbtvJT
Ask for and offer help to other Alfresco Content Services Users and members of the Alfresco team.
Related links:
By using this site, you are agreeing to allow us to collect and use cookies as outlined in Alfresco’s Cookie Statement and Terms of Use (and you have a legitimate interest in Alfresco and our products, authorizing us to contact you in such methods). If you are not ok with these terms, please do not use this website.