How to create a custom model for extracting new meta-data
This article appears to be a work in progress. For another explanation of creating a custom model, see the Step by Step Model Definition section of the Data Dictionary Guide.
A very interesting aspect of Alfresco is the possibility for creating custom content models. The high flexibility of Alfresco allows the developer to extend easily the functionality of the platform in order to enlarge the coverable domains.
A new content model could be adopted to create a new meta-data extractor such as ones already defined inside the Alfresco platform.
Each file we upload to the Alfresco repository has its own meta-data: meta-data is a particular kind of information, associated to files, describing the file itself. For example, title, author, creation date, name are default meta-data associated to files. They describe the main characteristics of the file.
Obviously not the same meta-data is associated to different files (for example, an MP3 file has many attributes such as an album title, a song title, an artist, a genre, a composer and so on). These meta-data are quite different from a Word file. When an MP3 player loads an MP3 file, it tries to extract all the known meta-data and visualize them to the user. The same is done in a similar way by Alfresco. When a file is uploaded to the repository, a meta-data extractor - if explicitly associated to that file - tries to extract specific meta-data and load values associated to meta-data. The meta-data properties are then visualized by Alfresco into the web browser.
Once meta-data associated to files is extracted, the user is allowed to execute advanced searches in Alfresco using just these meta-data: each meta-data could be used as a filter to execute queries against the repository.
How can we create a new content model and a new extractor for specific meta-data?
The following sections show the process.
The creation of a new custom content model goes through the following process steps:
Creating a new Content Model file (.xml file) based on the Alfresco dictionary.
Making the new content model file available to the Alfresco container as an extension.
Creating a context file that points the new content model.
Making the element defined by the new content model available to web client.
Creating a Java package for extracting a predefined list of meta-data.
Allowing the execution of the Java classes when a specified type of file is uploaded to the Alfresco repository.
[to be completed]
1. Reviewing the Alfresco Data Dictionary and Content Model
Each Content Model is based on the Alfresco Data Dictionary, a UML model describing entities, properties and relations among object's instances. In order to design and develop an efficient custom content model, a complete understanding of the Alfresco Data Dictionary is needed. For complete documentation see the Data Dictionary Guide.
2. Creating a New Content Model File
The very first step is designing a new Content Model file, that is a .xml file which describes a list of meta-data you want to be associated to specific files and to be extracted at run-time by Alfresco when uploading a file.
The following is an example new Content Model file.
<namespaces> <namespace uri='mage.model' prefix='mage'/> </namespaces> <types> <type name='mage:meta'> <title>MAGE-ML Metadata Template for microarray experiments</title> <parent>cm:content</parent> <archive>true</archive> <properties> <property name='mage:projectName'> <title>Name of the microarray project</title> <description>Name of the microarray project</description> <type>d:text</type> <mandatory>true</mandatory> </property> <property name='mage:projectDescription'> <title>Description of the project</title> <description>Description of the project</description> <type>d:text</type> <mandatory>true</mandatory> </property> <property name='mage:projectCreationDate'> <title>Creation date of the project</title> <description>Creation date of the project</description> <type>d:text</type> <mandatory>true</mandatory> </property> <property name='mage:patientId'> <title>Patient identifier</title> <description>Patient identifier</description> <type>d:text</type> <mandatory>true</mandatory> </property> <property name='mage:patientAge'> <title>Age of the patient</title> <description>Age of the patient</description> <type>d:text</type> <mandatory>false</mandatory> </property> </properties> </type> </types> </model>