How to run a metadata extraction during bulk upload

cancel
Showing results for 
Search instead for 
Did you mean: 
boneill
Partner

How to run a metadata extraction during bulk upload

Hi Guys,

We are upgrading to Alfresco 7.0.1.  The new architecture uses the Alfesco Transformation Services AIO Tengines.  These are configured to extract metadata if an item is added to Alfresco by default.

What I need to do is to do a metadata extraction which maps additional values from the source document to the metadata of the document in Alfresco.  I only want to do this during a bulk upgrade.  Our previous solution was to use a rule on a folder to run a custom metadata extraction on document creation.  This allowed us to turn the rule on or off during migrations.  The new Alfresco Transform Service framework now handles metaadata extraction.  Therefore, 

1) If I want to extract extra fields, is it possible to call a metatadata extraction transform by request from a rule by either java or js.

2) Should I create a new tEngine to specifically do the extraction of existing fields, or update the config for the existing tEngine (which always gets called from what I understand).

3) Is there a different way to do this other than using a metadata extractor.

Regards

Brian

 

 

1 Reply
afaust
Master

Re: How to run a metadata extraction during bulk upload

1) You can call the generic "extract-metadata" action to trigger an extraction from a rule. Or you build a custom action that calls the RenditionService2 (not part of the supported Java API) to perform a rendition to the magic target mimetype "alfresco-metadata-extract". If there are mutiple possible T-Engine extractions from the same source mimetype, you can't really select which one to use except by using transform options that apply only to the one you intend to be run.

2) The default T-Engines should use the overwrite policy PRAGMATIC, which set metadata fields only if they don't exist, are null, the empty string, or EXIF/audio model-related. I have not found yet how to configure the engines to use PRUDENT instead, which removes the special handling for EXIF/audio, or even CAUTIOUS (will set only if property is not set yet - won't set if property explicitly set to null). If you find a way to specify the overwrite policy via config, that would be my way to go - otherwise: custom engine.

3) There always is a different way. None that makes use of default Alfresco features though...