Transfer Service 3.3

cancel
Showing results for 
Search instead for 
Did you mean: 

Transfer Service 3.3

resplin
Intermediate
0 0 1,314

Obsolete Pages{{Obsolete}}

The official documentation is at: http://docs.alfresco.com



Transfer Service3.3


Introduction


The Transfer Service came into existence in version 3.3 of Alfresco. Its purpose is to provide a means of pushing information out of an Alfresco core repository ('DM') to configured targets. The transfer service is accessible as a bean named 'TransferService' that is defined, along with other related beans, in the transfer-service-context.xml Spring context file.

This page discusses the implementation in alfresco 3.3.  Please see Transfer Service for later versions of Alfresco.


Overview


In version 3.3, the Transfer Service is a subsystem with an API that offers the following features:


  • Register a new transfer target
  • Transfer a new node for the first time (no corresponding node on the target)
  • Transfer a node that has a path-based corresponding node on the target
  • Transfer an update to a previously deployed node
  • Transfer any number of nodes
  • Transfer a restored node
  • Specify which association types are to be traversed for a given transfer
  • Discover what transfer targets are available
  • Configure a transfer target to use either HTTP or HTTPS
  • Cancel a transfer
  • Verify the details of a transfer target
  • View a record of a previous transfer attempt
  • Indicate that a transfer should run asynchronously with callback
  • Edit the attributes of a transfer target
  • Get the status of a given transfer
  • Restrict a transfer by class of content
  • Retrieve transfer records for a given transfer target
  • Unregister an existing transfer target
  • Delete a node by transferring the node's Archive node ref.

In 3.3, the Transfer Service enables nodes to be conveyed from one Alfresco repository to another Alfresco repository.


Design


As one might expect, the Transfer Service comprises two major parts: the part that is responsible for sending information from the source repository and the part that is responsible for receiving information in the target repository. The source repository pushes information to the target repository over a network transport. In 3.3 there is support for the use of HTTP and HTTPS across the network. Connections needed for a transfer always originate from the source.

Through the Transfer Service it is possible to create and persist information about any number of Transfer Targets. A Transfer Target records sufficient information about the target system to enable the service to establish an authenticated connection to it. Each transfer target record in the source repository is placed in a transfer target group. Currently there is just one transfer target group defined (called 'Default') and there is no means of creating new ones. The service is likely to be extended in the future to allow the management of transfer target groups.

Each transfer target is named, and the name must be unique within the transfer target group that contains it. Some operations on the TransferService interface allow for a transfer target name to be supplied but not the name of a transfer target group. In these cases the default transfer target group is assumed.

In order to transfer information to one of the configured transfer targets you simply create a Transfer Definition and pass it to the Transfer Service along with the name of the transfer target that you want to transfer to. A transfer definition identifies what should be transferred and has the potential to include some directives about how it should be transferred. In 3.3 a transfer definition comprises simply a collection of NodeRef objects. Note that it is acceptable for this collection to include NodeRefs of nodes that are in the archive store. When the target repository receives such a NodeRef during a transfer the corresponding node will be deleted.

When the transfer service receives a request for a transfer to be made, the first thing it does is export a snapshot of the nodes that are included in the transfer. This snapshot contains all the nodes' properties, but not the content of any properties of type d:content - instead the relevant content URLs for the content files are included in the snapshot. This makes the snapshot relatively lightweight and quick to generate. Once the snapshot has been created, the transfer service makes contact with the specified target and initiates a transfer. Any given target can receive just one transfer at a time currently, as this ensures that conflicts can't occur. The target puts in place a lock (a node named '.lock' beneath the Data Dictionary/Transfers folder) and returns a unique identifier for the transfer, and the source then starts transmitting the snapshot.

As the target system receives the snapshot it streams it into a local staging area on disk. After having sent the snapshot, the transfer service then works out which content items are required and sends the necessary content files over. These are batched up in groups - one of the goals of the design was to minimize the 'chattiness' of the transfer protocol - and staged on the target system's local disk.

Once the snapshot and associated content files have been transmitted, the transfer service asks the target system to commit the data to its repository. At this point, the receiver on the target system parses the received snapshot and reproduces the contained information in its local repository. This is done in three stages by default - the first writing the nodes and their properties, the second dealing with associations and the third dealing with sync mode delete.  It is possible to add new stages into this process if desired. When receiving a node, the receiver tries to resolve a corresponding node in the target repository based first on the node ref and then on the node's path. When a node is transferred that does not have a corresponding node in the target repository (either by path or by node ref) then a new node is created that has the same node ref as the transferred node.

Throughout the commit process, a record is written to a node in the target repository that lists what is being done. This node, stored in the 'Inbound Transfer Records' folder beneath 'Data Dictionary/Transfers', also has a few properties on it that records the transfer status. The name of this transfer record node is the date/time stamp that the transfer started.  After the transfer has completed this transfer report is pulled back to the source system and written as the 'destination transfer report' which is a sibling of the 'client transfer report' placed below the transfer target.

On the source end of the transfer, the caller may choose whether the transfer should be carried out synchronously (transfer) or asynchronously (transferAsync). Whichever version is used, the caller may optionally provide one or more callback objects (implementing the TransferCallback interface). As the transfer proceeds these objects are notified of progress by events being passed to their processEvent operation. One of these events (TransferEventBegin) contains the transfer identifier, and, once received, this can be used by the caller to cancel an 'in-flight' transfer.

As well as the interfaces and mechanisms needed to actually carry out the transfer, there are also a few classes intended to help build the set of nodes that the caller wants to transfer. The relevant interfaces are NodeCrawlerFactory, NodeCrawler, NodeFinder, and NodeFilter (all in the package org.alfresco.service.cmr.transfer). There is one implementation of each of the NodeCrawlerFactory and NodeCrawler interfaces (the standard NodeCrawlerFactory is a bean named 'NodeCrawlerFactory'). There are a couple of NodeFinder implementations that enable associations to be traversed (child and peer), and one NodeFilter implementation that enables content of given classes (types and aspects) to be included and excluded from the node crawl. It's simple to add new finders and filters to provide custom behaviour that meets a particular need.

Note that the interface exposed by the target repository (the receiver) should be considered an internal interface. It is liable to change over time, and no effort will be made to retain backwards compatibility.


Events raised during a transfer


As mentioned above, when requesting a transfer it is possible to supply a collection of TransferCallback objects. The TransferCallback interface defines one operation:



void processEvent(TransferEvent event);

As the transfer proceeds, events are raised and passed to each of the callback objects. The classes of events that can be raised are:


  • TransferEventBegin is sent when the transfer starts. It contains the identifier of the transfer which can later be used to cancel the transfer if desired
  • TransferEventEnterState is sent immediately after the transfer moves to a new state. The possible states of a transfer are START, SENDING_SNAPSHOT, SENDING_CONTENT, PREPARING, COMMITTING, SUCCESS, and ERROR. The state of the transfer is always available from any event via its getTransferState operation.
  • TransferEventEndState is sent immediately prior to the transfer leaving its current state.
  • TransferEventSendingSnapshot is sent when the snapshot file is being transmitted to the target repository
  • TransferEventSendingContent is sent when a content file is being transmitted to the target repository
  • TransferEventSentContent is sent when a content file has been sent to the target repository
  • TransferEventCommittingStatus is sent to provide an update as to progress while the target repository is processing the transferred data. It has two operations that provides this information: getPosition that indicates where the process is up to at the moment and getRange that indicates where the process has to get to before it is complete. Note that the value of the range can change as the process proceeds.
  • TransferEventSuccess is sent if the transfer completes successfully
  • TransferEventError is sent if the transfer ends with an error. This event exposes an operation named getException that can be used to help determine the cause of the problem.
  • TransferEventReport this event is send when a transfer report is written.   It contains the nodeRef of the report and the type of the report.   There are currently two types of report one from the source and one from the destination.




Location of classes relevant to the Transfer Service


The classes and interfaces that comprise the public API to the Transfer Service are located in the org.alfresco.service.cmr.transfer package. The core of the implementation is in the org.alfresco.repo.transfer package and its sub-packages manifest, report, and script. Log levels can be adjusted for these packages if more or less log information is desired from the transfer mechanisms.


Usage Examples


Creating a new Transfer Target



TransferTarget target = transferService.create('The Other Repo');
target.setEndpointProtocol('https');
target.setEndpointHost('other.repo.example.com');
target.setEndpointPath('/alfresco/service/api/transfer');
target.setUsername('remoteperson');
target.setPassword('password'.toCharArray());
transferService.saveTransferTarget(target);

Note that a transfer target must be committed into the repository before it can be used for a transfer.


Building a set of nodes to transfer



//This example walks a tree of nodes starting at a given root node (assumed to be known already). It traverses
//only associations of type 'cm:contains' (therefore, presumably, the root node is of type cm:folder (or subtype))

NodeCrawler crawler = nodeCrawlerFactory.getNodeCrawler();
crawler.setNodeFinders(new ChildAssociatedNodeFinder(ContentModel.ASSOC_CONTAINS));
Set<NodeRef> nodesInTree = crawler.crawl(rootNode);

Transferring a set of nodes synchronously



//This snippet uses the target name and set of nodes used in the previous examples.
TransferDefinition transferDef = new TransferDefinition();
transferDef.setNodes(nodesInTree);
NodeRef transferReportNode = transferService.transfer('The Other Repo', transferDef);

Where transferred nodes are placed in the target repository


When a node is transferred, a package of information about it is sent from the source repository to the target repository. Among other things, that information includes:


  • the node ref (store ref + UUID) of the node
  • the node ref of the node's primary parent
  • the qualified path of the node's primary parent
  • the qualified name of the node's primary parent association
  • the qualified type of the node's primary parent association

This information is used by the transfer receiver in the target repository to work out where the transferred node should be placed and whether a 'corresponding node' already exists in that location. This is done in the following way:


  1. if a node exists with the same node ref (store + UUID) then this is considered to be the corresponding node, and the transfer is handled as an update to that node
  2. if a node exists with the same node ref as the transferred node's primary parent then this is considered to be the corresponding parent node
  3. if the store of the transferred node's primary parent does not exist in the target repository then the transfer fails
  4. if the corresponding parent node has not yet been found then try to resolve it by path in the store identified by the node's primary parent node ref
  5. if the corresponding parent node has still not been found then the incoming node is currently an 'orphan' in the receiving repository - its corresponding parent node is mapped to a temporary location and the transfer is handled as a creation of the node
  6. if a node exists in the target repository that is associated with the corresponding parent node with the same name as the transferred node is associated with its parent node, then this is considered to be the corresponding node, and the transfer is handled as an update to that node
  7. if a corresponding node has still not been found then the following additional checks are made:
    1. if the store of the transferred node is the archive store then an attempt is made to find its corresponding node in the store corresponding to its original store (by node ref and by path). If this successfully finds a corresponding node then the transfer is handled as a deletion of the corresponding node
    2. if the store of the corresponding node is not the archive store then an attempt is made to find its corresponding node in the archive store of the receiving repository (by node ref only). If this successfully finds a corresponding node then the transfer is handled as a restore of the corresponding node (followed by an update of the restored node)

In the case where the inbound node is initially determined to be an 'orphan', this status is continuously checked during the course of that transfer. If its parent node appears later on in the same transfer then the orphan is re-parented. Note that orphans are not permitted to remain following a transfer. If an orphan's parent node does not appear during the same transfer then the transfer will fail.




Wish List


For the current 'wish list' see the page Transfer_Service