The importance of the XML language in clinical-genetic domains
The importance of the XML language in clinical-genetic domains is a well known challenge in the biomedical world. The adoption of XML on biomedical information interoperability has involved great changes representing clinical and genomic data from different aspects:
allowing the interoperability among heterogeneous systems in order to exchange and share clinical and genetic information;
allowing the definition and representation of clinical and genetic information in a structured way.
The MAGE-ML standard for microarray (genetic) data representation
Many international standards have been developed in order to define common structure for clinical and genetic documents. The most important and well known by health care providers, labs and researchers for microarray genetic data is MAGE-ML (for further details please visit http://www.mged.org/Workgroups/MAGE/introduction.html). MAGE-ML (Microarray and Gene Expression ï¿½ Model Language) is a UML model implemented in XML for microarray expression experiments. Microarray experiments are executed by genetic labs and represent a particular kind of genetic experiment in order to evaluate the gene expression profiles in individuals. MIAME protocol points out the Minimum Information About Microarray Experiments, that is it individuates a set of information (metadata) that each lab conducting a microarray experiment has to provide in order to allow some other labs to replicate the experimental conditions.
An example of MAGE-ML matadata
An example of a MAGE-ML document follows (only few linesï¿½):
Inside the MAGE consortium it has been developed a set of open source Java APIs allowing the manipulation of XML documents validated against a specific MAGE-ML schema and the extraction of both experimental and clinical information. MAGE-ML Java APIs are based on other open source Java packages for accessing and manipulating XML documents via DOM, SAX and XPath.
Objectives of the project could be the following:
Recognize MAGE-ML documents natively (automatic recognition based on specific metadata);
Extract some specific metadata from MAGE-ML documents. The extraction of metadata could be implemented in both a static and dynamic way (please see the paragraph about discussion about the implementation approach above);
Associate some specific metadata to MAGE-ML documents in order to allow users (lab users) to execute advanced searches on MAGE-ML metadata.
Can you elaborate on what an the advanced searches might be and to some high level extent, what they involve?
Initial Project Scope
The project scope in general is to extend the core Alfresco classes for metadata extraction in order to provide Alfresco lab users with the possibility of associating MAGE-ML metadata to specific XML documents.
At the very beginning, we could think about a pre-defined and static list of MAGE-ML metadata linked to a document.
Afterwards, a more challenging goal could be allowing users to define in a dynamic and flexible way (a configuration file) the list of MAGE-ML metadata they are interested in.
Discussion of Design/Implementation Approach
From the implementation point of view we could think about the following steps:
Extending the set of Java classes of the Alfresco repo creating a new class ï¿½ similar to the existing ones such as OpenOfficeMetadataExtracter.java ï¿½ called MageMlMetadataExtracter.java. This new class will be able to extract a set of predefined metadata in a static way;
Defining an XML configuration file that contains a predefined list of MAGE-ML metadata a common lab user could be interested in to extract;
Defining an empty XML configuration file that will contain a dynamic list of MAGE-ML metadata a common lab user could be interested in to extract in addition to the predefined metadata.
The metadata extraction policy is under evaluation, so we could discuss about it to individuate the best and more suitable strategyï¿½--