Someone please help me in finding out the best practices for Handling Alfresco 6 replication, keeping below in mind.
Any pointers ... much appreciated.
Thanks a lot in advance.
I don't know the purpose of your content replication. Replication jobs have certain limitations as it keeps the content as read-only at the targeted repository.
But as you have mentioned about the database transfer, I guess you want to set up a new instance of Alfresco using the existing instance.
If that is the purpose then the best way to do it is to back up and restore.
Take a backup of indexes, content stores, database and customizations you have made then restore these all to the newly installed instance.
There are standard backup and restore steps in the documentation http://docs.alfresco.com/6.1/concepts/ch-backup-restore.html
Thanks a lot Kintu Barot fo the reply.
Yes We want to create a new instance of alfresco in a different geographic region ( read only ) but the challenge is
the data should get replicated immediately ( with permissible latency ), including the revision history + Users / Groups created.
are you expecting to run / to login to the second, replicated Alfresco system? If not you make sure to replicate your alf_data/contentstore server directory (rsync jobs, block based replication, ...) and to set up a db replication. You should not start the second Alfresco system if you don't know what you're doing since you may corrupt your repository db.
If you intend to use both Alfresco installations on the same time (even in "read-only" mode) you need something like Alfresco cluster technology which is not part of the Community Edition (but there is a commerical cluster offer for the CE and a specific geo repo replication solution for EE).
Alfresco Replication is not a real replication but a copy job which creates a new independant node in the second system. The copied node has no metadata and a different node id). Alfresco Replication is not designed as a desaster recovery solution. We implemented a more sophisticated solution which is a more or less one way async Server2Server sync wich also replicates defined metadata but this is maybe also not what you expect?
If you expect to run more than one Alfresco system in an active mode clustering is not the perfect solution because it will slow your whole system down dramatically if not running in a low latency network. Instead you would need an async services which supports a locking mechanism respected from all involved systems to avoid conflicts which are very hard to handle. Unfortunately there is no such solution available I know of but maybe some will correct me?
could you please describe the solution and requirements more in detail, what your customer / your users expect?
Thanks a lot for the advise Heiko Robert.
Here is my problem statement.
We have alfresco-60 setup complete and functioning in production well.
Now we need to build a kind of replication system to the above, on a different geographical location.
Now system should function like...!
I hope am clear. Any suggestions will help me a lot.
Also please clarify, if at all the alfresco clustering can be a solution, can I have 2 clusters located on different geographic locations on Amazon Cloud.
Do we anticipate any architectural issues here...?
the scenario you describe is not officially supported by Alfresco.
The Alfresco Partner IT-Novum has an addon for the Enterprise Editon which should cover your use case: Alfresco Geo-Caching by it-novum | Alfresco
If you need this functionallity for the community edition you may:
Another / easier setup would be to use our (ecm4u's) Alfresco Server2Server Sync module which implements an unidirectional, query based sync to copy/modify/delete nodes from one site to another. The logic is similar to rsync but on top of the Alfresco-API. We support types, aspects and metadata but no versions, comments.The documents in the target will get new nodeRefs. The second system doesn't need to run in read only mode but the synced directories shouldn't have write permissions for the users on site B.
Alfresco Clustering is a bad idea for WAN / high latency networks and will also not care about distribution/replication. It may work if your network is fast enough to use the repository DB from site Prod1 (no db replication). If not you may get in trouble with the timelag of the db replication since the Alfresco cluster is more or less a cluster of the Cache. So if Cache is newer than the DB you will get "unexpected" behavior ...