a) FSTR - File system transfer receiver(from alfresco to file system) .
b) Alfresco replication service - (From one alfresco to other alfresco)
On admin console - we have an option to disable Read only replication, by default it is selected
For FSTR - i had tried to disable it , but if you try to update the content on source and then try to replicate it will have an issue and junk file(.ftr something ) will be formed. So it should be read only , and basically one way replication.
Also it is very very slow - takes lot of time. On FSTR side also you have a file system with executable jar and internal db which gets updated in the process , taking lot of time.
Had issues with linking module.
For alfresco replication service -
a) Very very slow -
b) Lots of timeouts , since if you have a web proxy above tomcats - the socket will is most likely to be closed as per the configuration.
c) Applying the ssl and doing replication over ssl is also very difficult and very very slow.
Just to add, there are faster approaches
We used Goodsync tool and got the content synced at regular intervals via webDAV . Were able to sync effectively onto filesystem using Goodsync tool.
The idea was to replicate the content of a part our production server to a "form" server used for tests only. Only a part because just the financial documents stored into a unique folder and classified into an arborecence. About 200 000 documents. We just need to keep the same IUD/noderef and be able to modify the replicated content.
So if I understand your thought the replication mechanism is not recommended !
Yes Goodsync would just the contents and sync the contents as per the contents at the source.
- Goodsync was used when syncing is important at regular scheduled intervals - but UUID was not available at destination. It is just a sync tool like winscp with better capabilities on scheduling and also from different trype of sources like S3 etc .
Not aware of ACP .
Jeff is the best person i am aware of who can help as per your scenario. I just though to share some experience with Replication service and also if you have a matching usecase can try this.
Do you need to sync 200,000 documents or are you syncing only a few documents and NOT syncing 200,000 documents? If you are syncing all 200,000 you will definitely not use the replication service.
Also, 200,000 is probably too many docs to put in an ACP file although it depends on how big the documents are. If you want to use ACP you will probably have to do many smaller batches.
I recommend an asynchronous approach. At a very very high-level, you can put messages on a queue or an event stream and then have an integration server that is monitoring the queue/stream so that it can replicate the changes to another server. I've done this with ActiveMQ. I wrote a behavior to watch for the changes I'm interested in. The behavior then puts a message on a queue that essentially says, "This object changed". Then, over in my other Alfresco server, I have a listener that is subscribed to the queue. When it sees a message it grabs the node reference from the message then calls the source Alfresco server to fetch the object and persist it into the repo.
Similarly, when an object is deleted, a message goes on the queue, then the target server sees that message and deletes the object on the target server.
In my example, I didn't care about node refs changing between the two servers. I just stored the originating node reference as a property on the target server so I could track back to the source.
In your case, instead of fetching the object you could trigger an export into an ACP file, then on the target server, import the ACP to preserve the node reference.
My example does one-way sync, which it sounds like would work for you. The folks at Parashift have a product that does two-way synchronization leveraging event streams and Apache Camel. Here is a blog post they wrote about it: Stream Processing with Alfresco – Parashift . They did a presentation at DevCon in Zaragoza but I can't seem to find that presentation at the moment.
If you want to try using streams instead of queues, you might take a look at a project I have on github that writes events to Apache Kafka from a behavior. It's just something I was playing with so it is not production-hardened code.