Smart Content Migration using Apache ManifoldCF

cancel
Showing results for 
Search instead for 
Did you mean: 

Smart Content Migration using Apache ManifoldCF

openpj
Moderator
Moderator
2 5 4,450

If you are interested to migrate contents into your Alfresco repository and you have to get contents from different data sources, you may be interested to see my session held with my friend Luis Cabaceira during the last Alfresco DevCon 2018 in Lisbon.

The title of our session is Smart Content Migration using Apache ManifoldCF and we have described two different scenarios based on a pretty new capability that we are adding: content migration!

This will make easier content migration needs for any ECM architect or developer involved in migration for federating different silos / repositories into an Alfresco platform. We have implemented these two new connectors:

  • CMIS Output Connector (presented by me)
  • Alfresco BFSI Output Connector (presented by Luis)

 

CMIS Output Connector allows to migrate contents from any repositories supported by ManifoldCF: from SharePoint, Documentum to FileNet and so on to any CMIS-compliant repo such as Alfresco Smiley Happy

Alfresco BFSI Output Connector allows to get contents from any repo and it will create the standard structure on file system to allow Alfresco importing contents using the BFSI tool. This is absolutely amazing in terms of performance!

Migration architecture of the new Content Migration included in Apache ManifoldCF

Try by yourself the new connectors for Content Migration (beta)

Please consider that all the material shown during the session is related to a development sprint that we have to complete and you will not find yet this new features inside the stable release of ManifoldCF.

Anyway you can try these new connectors building from the related branch:

CMIS Output Connector

JIRA issue: https://issues.apache.org/jira/browse/CONNECTORS-1356
Source code: https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1356-2.7.1/


Alfresco BFSI Output Connector

JIRA issue: https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1442
Source code: https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1442/

Stay in touch with us in the ManifoldCF Community

If you are interested in our work, also if you want to contribute , please consider to stay in touch with us subscribing our mailing lists or entering in our chatroom to follow the development of Apache ManifoldCF, below all the references:

Web Forums / Mailing lists

https://lists.apache.org/list.html?dev@manifoldcf.apache.org

https://lists.apache.org/list.html?user@manifoldcf.apache.org

Classic Mailing list

http://manifoldcf.apache.org/en_US/mail.html

HipChat Public Room

https://www.hipchat.com/gOSgu3lH8

A complete article about my involvement during Alfresco DevCon 2018 can be found on my website.

If you need a quote about design and implementation of migration projects, please contact us.

We are looking forward to talk with you to understand how we can help you here in TAI Solutions.

About the Author
Solution Architect AIIM Professional Member Certified Alfresco Instructor / ACE / ACA Mentor, PMC Member and Committer @ Apache Software Foundation Author @ Packt Publishing Project Leader @ JBoss Community
5 Comments
yogeshpj
Established Member

Does this connectors take care of associations between contents after migration?

openpj
Moderator
Moderator

I have to confess that a lot of people is asking me the same question, and my reply is... it depends! 

 

Considering the new Alfresco BFSI Output Connector, we discussed this in the ManifoldCF community together with also Luis Cabaceira‌ some months ago, I think that the problem is related to the Alfresco Bulk File System Import itself.

It should work correctly for any child-association, this is the native support from the Alfresco tool, because we are practically generating the file system for making the use of Bulk File System Import in Alfresco taking also child-association if I'm not wrong.

I don't remember if Alfresco BFSI is currently supporting the import of nodes with peer-associations, historically I have found the following:

Add support for peer-to-peer associations within the imported content set · Issue #16 · pmonks/alfre... 

In Alfresco JIRA we actually see the following:

[MNT-12641] Bulk Import tool does not work with associations - Alfresco JIRA 

Associations between content items loaded by the tool are not yet nicely supported. 
Associations to objects that are already in the repository can be created using
the NodeRef of the target object as the value of the property.

 

This is a tricky problem to solve automatically and in a general way but with ManifoldCF we could solve this problem just adding some pipeline support to our current Output Connectors.

Consider when you are executing a migration job, you have to be sure that in the current execution you are migrating an entire partition of the repository. It should be then totally consistent in terms of associated nodes otherwise some of the nodes could not have all the decorations in terms of metadata.

If you read the Bulk File System Import documentation:

Preparing the Source Content · pmonks/alfresco-bulk-import Wiki · GitHub 

Here there is an explicit mention that for importing association you have to be sure to have previously imported the leaves of your tree before importing the associated nodes. I'm not sure that the Alfresco BFSI can do that automatically. 

I would like to solve this problem anyway using the CMIS Output Connector but at the moment we are not supporting properties migration yet. Consider that the current version included in the latest stable release was implemented in an entire year of our spare time. I hope to release in the next version a complete management of properties using the ManifoldCF Metadata Adjuster, in this way everyone can configure the current schema and the target schema for each property.

What we can do for solving the problem of migrated nodes is to implement a second iteration for each migration job execution dedicated to persist association inside the database of ManifoldCF and finally it should create all the association instances in the last lap.

This is not easy because we have to be sure to migrate in a first step all the child nodes considering the primary parent associations and then all the secondary parent associations. It should be challenging but it is absolutely feasible.

I hope for the next release of ManifoldCF to include also the Alfresco BFSI Output Connector and then add the support for metadata to the CMIS Output Connector. Then we could start to work on migrating associated nodes with primary and secondary child-association and peer-association.

We can do that, we only need some time to invest on this development on ManifoldCF.

Any volunteer? Smiley Happy

yogeshpj
Established Member

Thanks Piergiorgio for your reply. I was doubting about association in both output connectors. I will be happy to put some efforts in developing on ManifoldCF.

openpj
Moderator
Moderator

yogesh_PJ _‌ thank you for your interest in the project and please subscribe our dev mailing list:

ManifoldCF - Mailing Lists 

Try to take a look the developer resources page:

ManifoldCF - Developer Resources 

Or if you want to use mailing as forums use the web version here:

Apache Mail Archives 

Drop an email for any request of information of any doubts that you have, don't be shy Smiley Happy

marcog
Partner

Good afternoon and thank you for your report on ManifoldCF integrated with Alfresco, but I have a question about it and I don't even know if this is the way to discuss this point.

I'm doing the sync between Google Drive and Alfresco using Jobs in ManifoldCF. But there are folders created in Drive with the character "." at the end of the name (eg Photos 2019.).

Generating an error stating that the name with the entered character is not allowed.

Is there any way to configure in order to remove this "." at the end of the file or folder name?

Marco