Obsolete Pages{{Obsolete}}
The official documentation is at: http://docs.alfresco.com
Content ModelingCore Repository Services
This document assumes knowledge of how to extend the repository configuration.
Back to Developer Guide
Back to Server Configuration
Meta-data extractors offer server-side extraction of values from added or updated content.
System administrators can find definitions of the default set of extractors in
Configuration options are detailed in the Javadocs
Sample configurations are in
Developers can study the default implementations contained in package:
Note: The interface MetadataExtracter should be MetadataExtractor. Otherwise the word extractor is used in this document.
One of the default actions that can be triggered in a space is Extract Common Metadata. This action will look at the mimetype of the document that triggered the rule and request an appropriate MetadataExtracter from the default MetadataExtracterRegistry. Each extractor is registered to handle a set of mimetypes.
Before V2.1, the extractor would pull out a set of values from the document and copy these directly into the document meta-data. If the property was declared as part of an aspect in the model, then the aspect is also added to the document. Developers can look at org.alfresco.repo.content.metadata.AbstractMetadataExtracter. When a property already exists, it is not overwritten by the extractor.
V2.1 separated the extraction of values from the document and the setting of system properties. In V2.1, the extractor will pull a set of values from the document (the full list is declared in the javadocs of each class). The extractor uses a set of properties to map the extracted values to the document's meta-data. By default, the extractor will not overwrite any properties already present in the document's meta-data, but this can be changed by overriding the extractor's bean definition. Developers should look at org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter.
These are the extractors defined within <WEB-INF>/classes/alfresco/content-services-context.xml for V2.1:
<bean id='extracter.PDFBox' class='org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter' parent='baseMetadataExtracter' />
<bean id='extracter.Office' class='org.alfresco.repo.content.metadata.OfficeMetadataExtracter' parent='baseMetadataExtracter' />
<bean id='extracter.Mail' class='org.alfresco.repo.content.metadata.MailMetadataExtracter' parent='baseMetadataExtracter' />
<bean id='extracter.Html' class='org.alfresco.repo.content.metadata.HtmlMetadataExtracter' parent='baseMetadataExtracter' />
<bean id='extracter.MP3' class='org.alfresco.repo.content.metadata.MP3MetadataExtracter' parent='baseMetadataExtracter' />
<bean id='extracter.OpenDocument' class='org.alfresco.repo.content.metadata.OpenDocumentMetadataExtracter' parent='baseMetadataExtracter' />
<bean id='extracter.OpenOffice' class='org.alfresco.repo.content.metadata.OpenOfficeMetadataExtracter' parent='baseMetadataExtracter' >
<property name='connection'>
<ref bean='openOfficeConnection' />
</property>
</bean>
Assuming you have a new extractor written in class com.company.MyExtracter, you can declare the extractor:
<extension-config>/alfresco/extension/custom-repository-context.xml:
<bean id='com.company.MyExtracter' class='com.company.MyExtracter' parent='baseMetadataExtracter' />
The parent bean (parent='baseMetadataExtracter') will register the extractor with the metadataExtracterRegistry bean. It will automatically be available for use by the Alfresco server to handle the mimetypes that your extractor declared.
Note: This applies to V2.1 only. Earlier versions were hardcoded.
We'll use the extracter.OpenDocument as an example of how to modify the configuration. This extractor handles all the OpenDocument formats using a connection to a headless OpenOffice process. Before reading more, open up the following:
Meta-data extractor for the MIMETYPE_OPENDOCUMENT_XXX mimetypes.
creationDate: -- cm:created
creator: -- cm:author
date:
description: -- cm:description
generator:
initialCreator:
keyword:
language:
printDate:
printedBy:
subject:
title: -- cm:title
All user properties
...
<bean id='extracter.OpenDocument'
class='org.alfresco.repo.content.metadata.OpenDocumentMetadataExtracter'
parent='baseMetadataExtracter' >
<property name='inheritDefaultMapping'>
<value>true</value>
</property>
<property name='mappingProperties'>
<props>
<prop key='namespace.prefix.cm'>http://www.alfresco.org/model/content/1.0</prop>
<prop key='user1'>cm:description</prop>
</props>
</property>
</bean>
...
Copy the extension sample to <extension-config>/alfresco/extension/custom-metadata-extrators-context.xml to activate the sample for your server. The Javadocs for the extractor give the list (on the left) of values extracted from the document. All these extracted values are put into a map, ready for conversion to model-specific properties. By default, the following will be populated by the extractor:
creationDate: -- cm:created
creator: -- cm:author
description: -- cm:description
title: -- cm:title
Let's assume that a user property, user1, will be used by the Alfresco users to fill in the description of the documents they edit. The description field extracted by the extractor should be ignored and the user1 field used instead. We inherit all the other mappings and just modify how the user1 field is used.
Perhaps, you wish to put your changes in a property file instead:
<extension-config>/alfresco/extension/custom-metadata-extrators-context.xml:
<bean id='extracter.OpenDocument'
class='org.alfresco.repo.content.metadata.OpenDocumentMetadataExtracter'
parent='baseMetadataExtracter' >
<property name='inheritDefaultMapping'>
<value>true</value>
</property>
<property name='mappingProperties'>
<bean class='org.springframework.beans.factory.config.PropertiesFactoryBean'>
<property name='location'>
<value>classpath:alfresco/extension/custom-opendocument-extractor-mappings.properties</value>
</property>
</bean>
</property>
</bean>
<extension-config>/alfresco/extension/custom-opendocument-extractor-mappings.properties:
namespace.prefix.cm=http://www.alfresco.org/model/content/1.0
user1=cm:description
Document properties are generally extracted as Java String types, but this might not always be the case. When the properties are mapped to system properties, the extractor now explictly performs a data type conversion to catch any failures at the point of extraction. Properties that cannot be converted to the required type, where a property exists in the data dictionary, can either be discarded or cause extraction failure (default is failure).
<property name='failOnTypeConversion'>
<value>false</value>
</property>
...
</bean>
Alfresco's default String to Date conversion uses the ISO 8601 format, i.e. sYYYY-MM-DDThh:mm:ss.sssTZD. During meta-data extraction, the date strings are seldom in the correct format. A list of alternative formats can be specified and will be used if the ISO 8601 conversion fails and the target system property is d:date or d:datetime
...
<bean id='extracter.xml.wsf.articleMetadataExtracter'
class='org.alfresco.repo.content.metadata.xml.XPathMetadataExtracter'
parent='baseMetadataExtracter'
init-method='init' >
<property name='supportedDateFormats'>
<list>
<value>yyyy.MM.dd G 'at' HH:mm:ss z</value>
</list>
</property>
...
The list will be processed in order until they have all failed or one has succeeded. For the full list of options to describe the date formats, see the SimpleDateFormat Javadocs.
When an aspect-defined property is extracted and added to the document's metadata, the associated aspect is implicitly added. By default any values already present in the metadata will remain, but it is possible to change this behaviour on a system-wide level by specifying that any properties not extracted should be removed from the target node. Override the bean extract-metadata and set the carryAspectProperties to false.
<configRoot>/alfresco/action-service-context.xml:
<bean id='extract-metadata' class='org.alfresco.repo.action.executer.ContentMetadataExtracter' parent='action-executer'>
...
<property name='carryAspectProperties'>
<value>false</value>
</property>
</bean>
For example, if an aspect defines properties p:x and p:y but the document only contains p:x, then p:y will be removed from the target node.
The metadata extractor is not available as a root service in JavaScript, but it is available as an action.
var action = actions.create('extract-metadata');
action.execute(document);
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Ask for and offer help to other Alfresco Content Services Users and members of the Alfresco team.
Related links:
By using this site, you are agreeing to allow us to collect and use cookies as outlined in Alfresco’s Cookie Statement and Terms of Use (and you have a legitimate interest in Alfresco and our products, authorizing us to contact you in such methods). If you are not ok with these terms, please do not use this website.