How to preserve original document create and modified date during upload

cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Moderator
Moderator

Re: How to preserve original docucment create and modified date during upload

Jump to solution

@sanjaybandhniya  Hope you have read the information shared above.

"TikaAutoMetadataExtracter takes care of other mimetypes which doesn't have specific extractors, It uses AutoDetectParser for parsing and extraction. E.g. for images"

And gave example of TikaAutoMetadataExtractor and other with bold letters: "An example for Images, PDF, Office"

Look ath this bean definition which is provided in above response as well:

<bean id="extracter.TikaAuto" class="org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter" parent="baseMetadataExtracter">

Highlighted
Senior Member II

Re: How to preserve original docucment create and modified date during upload

Jump to solution

This is my bean.

<bean id="extracter.TikaAuto" class="org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter"
		parent="baseMetadataExtracter">
		<constructor-arg>
			<ref bean="tikaConfig" />
		</constructor-arg>
		<property name="overwritePolicy">
			<value>EAGER</value>
		</property>
		<property name="mappingProperties">
			<bean
				class="org.springframework.beans.factory.config.PropertiesFactoryBean">
				<property name="location">
					<value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value>
				</property>
			</bean>
		</property>
	</bean>

	<bean id="extracter.PDFBox"
		class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter"
		parent="baseMetadataExtracter">
		<property name="documentSelector" ref="pdfBoxEmbededDocumentSelector" />
		<property name="overwritePolicy">
			<value>EAGER</value>
		</property>
		<property name="mappingProperties">
			<bean
				class="org.springframework.beans.factory.config.PropertiesFactoryBean">
				<property name="location">
					<value>classpath:alfresco/metadata/PdfBoxMetadataExtracter.properties</value>
				</property>
			</bean>
		</property>
	</bean>

	<bean id="extracter.Poi"
		class="org.alfresco.repo.content.metadata.PoiMetadataExtracter"
		parent="baseMetadataExtracter">
		<property name="poiFootnotesLimit" value="${content.transformer.Poi.poiFootnotesLimit}" />
		<property name="poiExtractPropertiesOnly" value="${content.transformer.Poi.poiExtractPropertiesOnly}" />
		<property name="poiAllowableXslfRelationshipTypes">
			<list>
				<!-- These values are valid for Office 2007, 2010 and 2013 -->
				<value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/presProps</value>
				<value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/viewProps</value>
			</list>
		</property>
		<property name="overwritePolicy">
			<value>EAGER</value>
		</property>
		<property name="mappingProperties">
			<bean
				class="org.springframework.beans.factory.config.PropertiesFactoryBean">
				<property name="location">
					<value>classpath:alfresco/metadata/PoiMetadataExtracter.properties</value>
				</property>
			</bean>
		</property>
	</bean>
Thanks & Regard,
Sanjay
Highlighted
Moderator
Moderator

Re: How to preserve original docucment create and modified date during upload

Jump to solution

Your bean looks correct, what is the config in these files:

PdfBoxMetadataExtracter.properties
PoiMetadataExtracter.properties
TikaAutoMetadataExtracter.properties

Highlighted
Senior Member II

Re: How to preserve original docucment create and modified date during upload

Jump to solution

Properties file having my custom properties.

namespace.prefix.ks=http://www.alfresco.com/model/custom-model/1.0
created=ksSmiley SurprisedriginalCreationDate
modified=ksSmiley SurprisedriginalModificationDate

 

My content Model

 

<aspects>
		<aspect name="ks:importedDoc">
			<properties>
				<property name="ks:originalCreationDate">
					<type>d:date</type>
				</property>
				<property name="ks:originalModificationDate">
					<type>d:date</type>
				</property>
			</properties>
		</aspect>
	</aspects>

It is working for Pdf and Office files.

Thanks & Regard,
Sanjay
Highlighted
Moderator
Moderator

Re: How to preserve original docucment create and modified date during upload

Jump to solution

Hmm kind of weird. It should work i think. Let me try at my end and see what i get. 

Highlighted
Moderator
Moderator

Re: How to preserve original docucment create and modified date during upload

Jump to solution

It seems to work perfectly. Try re-checking the configs and logs and see what you get.

Here is the test i did:

<bean id="extracter.TikaAuto" class="org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter"
parent="baseMetadataExtracter">
<constructor-arg>
	<ref bean="tikaConfig" />
</constructor-arg>
<property name="overwritePolicy">
	<value>EAGER</value>
</property>
<property name="mappingProperties">
	<bean
		class="org.springframework.beans.factory.config.PropertiesFactoryBean">
		<property name="location">
			<value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value>
		</property>
	</bean>
</property>
</bean>

<bean id="extracter.PDFBox"
class="org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter"
parent="baseMetadataExtracter">
<property name="documentSelector" ref="pdfBoxEmbededDocumentSelector" />
<property name="overwritePolicy">
	<value>EAGER</value>
</property>
<!-- Including custom properties -->
<property name="mappingProperties">
	<bean
		class="org.springframework.beans.factory.config.PropertiesFactoryBean">
		<property name="location">
			<value>classpath:alfresco/metadata/PdfBoxMetadataExtracter.properties</value>
		</property>
	</bean>
</property>
</bean>

<bean id="extracter.Poi"
class="org.alfresco.repo.content.metadata.PoiMetadataExtracter"
parent="baseMetadataExtracter">
<property name="poiFootnotesLimit" value="${content.transformer.Poi.poiFootnotesLimit}" />
<property name="poiExtractPropertiesOnly" value="${content.transformer.Poi.poiExtractPropertiesOnly}" />
<property name="poiAllowableXslfRelationshipTypes">
	<list>
		<!-- These values are valid for Office 2007, 2010 and 2013 -->
		<value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/presProps</value>
		<value>http://schemas.openxmlformats.org/officeDocument/2006/relationships/viewProps</value>
	</list>
</property>
<property name="overwritePolicy">
	<value>EAGER</value>
</property>
<!-- Including custom properties -->
<property name="mappingProperties">
	<bean
		class="org.springframework.beans.factory.config.PropertiesFactoryBean">
		<property name="location">
			<value>classpath:alfresco/metadata/PoiMetadataExtracter.properties</value>
		</property>
	</bean>
</property>
</bean>

TikaAutoMetadataExtracter.properties

# Namespaces
namespace.prefix.cm=http://www.alfresco.org/model/content/1.0
namespace.prefix.exif=http://www.alfresco.org/model/exif/1.0
namespace.prefix.audio=http://www.alfresco.org/model/audio/1.0
#Custom model namespace namespace.prefix.demo=http://www.github.com/abhinavmishra14/model/demo/1.0
# OOTB Default Mappings author=cm:author title=cm:title description=cm:description created=cm:created
# Custom Properties to be mapped created=demo:originCreatedDate modified=demo:originModifiedDate
geo\:lat=cm:latitude geo\:long=cm:longitude tiff\:ImageWidth=exif:pixelXDimension tiff\:ImageLength=exif:pixelYDimension tiff\:Make=exif:manufacturer tiff\:Model=exif:model tiff\:Software=exif:software tiff\:Orientation=exif:orientation tiff\:XResolution=exif:xResolution tiff\:YResolution=exif:yResolution tiff\:ResolutionUnit=exif:resolutionUnit exif\:Flash=exif:flash exif\:ExposureTime=exif:exposureTime exif\:FNumber=exif:fNumber exif\:FocalLength=exif:focalLength exif\:IsoSpeedRatings=exif:isoSpeedRatings exif\:DateTimeOriginal=exif:dateTimeOriginal xmpDM\:album=audio:album xmpDM\:artist=audio:artist xmpDM\:composer=audio:composer xmpDM\:engineer=audio:engineer xmpDM\:genre=audio:genre xmpDM\:trackNumber=audio:trackNumber xmpDM\:releaseDate=audio:releaseDate #xmpDM:logComment xmpDM\:audioSampleRate=audio:sampleRate xmpDM\:audioSampleType=audio:sampleType xmpDM\:audioChannelType=audio:channelType xmpDM\:audioCompressor=audio:compressor

PdfBoxMetadataExtracter.properties

# Namespaces
namespace.prefix.cm=http://www.alfresco.org/model/content/1.0
#Custom model namespace namespace.prefix.demo=http://www.github.com/abhinavmishra14/model/demo/1.0
# OOTB Default Mappings author=cm:author title=cm:title subject=cm:description
# Custom Properties to be mapped created=demo:originCreatedDate modified=demo:originModifiedDate

PoiMetadataExtracter.properties

# Namespaces
namespace.prefix.cm=http://www.alfresco.org/model/content/1.0
#Custom model namespace namespace.prefix.demo=http://www.github.com/abhinavmishra14/model/demo/1.0
# OOTB Default Mappings author=cm:author title=cm:title description=cm:description
# Custom Properties to be mapped created=demo:originCreatedDate modified=demo:originModifiedDate

ContentModel:

<aspect name="demo:testAuditMetadata">
<title>Test Audit Metadata</title>
<description>Test Audit Metadata</description>
<properties>
<property name="demo:originCreatedDate">
<title>Original Created Date</title>
<description>Original Created Date</description>
<type>d:text</type>
</property>
<property name="demo:originModifiedDate">
<title>Original Modified Date</title>
<description>Original Modified Date</description>
<type>d:text</type>
</property>
</properties>
</aspect>

Log:

Image Extraction:
Mapped and Accepted: {{http://www.alfresco.org/model/exif/1.0}focalLength=4.5, {http://www.alfresco.org/model/exif/1.0}model=TG-5, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/exif/1.0}flash=false, {http://www.alfresco.org/model/exif/1.0}fNumber=8.0, {http://www.alfresco.org/model/exif/1.0}isoSpeedRatings=100, {http://www.alfresco.org/model/content/1.0}description={en_US=OLYMPUS DIGITAL CAMERA}, {http://www.alfresco.org/model/exif/1.0}dateTimeOriginal=Sun May 07 13:23:51 EDT 2017, {http://www.alfresco.org/model/exif/1.0}manufacturer=OLYMPUS CORPORATION, {http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2017-05-07T13:23:51, {http://www.alfresco.org/model/exif/1.0}pixelXDimension=590, {http://www.alfresco.org/model/exif/1.0}pixelYDimension=442, {http://www.alfresco.org/model/content/1.0}author=null, {http://www.alfresco.org/model/exif/1.0}exposureTime=0.005}
2020-05-22 09:58:00,041 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-14] Completed metadata extraction: 
reader:    ContentAccessor[ contentUrl=store://2020/5/22/9/57/db13881d-4caf-4a72-a481-054bb9246b63.bin, mimetype=image/jpeg, size=94399, encoding=UTF-8, locale=en_US]
extracter: org.alfresco.repo.content.metadata.TikaAutoMetadataExtracter@126bd574
changed:   {{http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2017-05-07T13:23:51,{http://www.alfresco.org/model/exif/1.0}focalLength=4.5, {http://www.alfresco.org/model/exif/1.0}model=TG-5, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/exif/1.0}flash=false, {http://www.alfresco.org/model/exif/1.0}fNumber=8.0, {http://www.alfresco.org/model/exif/1.0}isoSpeedRatings=100, {http://www.alfresco.org/model/content/1.0}description={en_US=OLYMPUS DIGITAL CAMERA}, {http://www.alfresco.org/model/exif/1.0}dateTimeOriginal=Sun May 07 13:23:51 EDT 2017, {http://www.alfresco.org/model/exif/1.0}manufacturer=OLYMPUS CORPORATION, {http://www.alfresco.org/model/exif/1.0}pixelXDimension=590, {http://www.alfresco.org/model/exif/1.0}pixelYDimension=442, {http://www.alfresco.org/model/content/1.0}author=null, {http://www.alfresco.org/model/exif/1.0}exposureTime=0.005}

PDF Extraction:
Mapped and Accepted: {{http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2018-10-26T20:36:24Z, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=null, {http://www.github.com/abhinavmishra14/model/demo/1.0}originModifiedDate=2018-10-26T20:36:28Z} 2020-05-22 09:58:11,676 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-2] Completed metadata extraction: reader: ContentAccessor[ contentUrl=store://2020/5/22/9/58/262e3dc1-5cfc-4558-9f01-fae20c5cae2d.bin, mimetype=application/pdf, size=3104712, encoding=UTF-8, locale=en_US] extracter: org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter@8414655 changed: {{http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2018-10-26T20:36:24Z, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=null, {http://www.github.com/abhinavmishra14/model/demo/1.0}originModifiedDate=2018-10-26T20:36:28Z}
Office extraction:
Mapped and Accepted: {{http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2020-02-10T16:13:00Z, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=Abhinav, {http://www.github.com/abhinavmishra14/model/demo/1.0}originModifiedDate=2020-02-10T20:05:00Z} 2020-05-22 09:58:22,021 DEBUG [org.alfresco.repo.content.metadata.AbstractMappingMetadataExtracter] [http-bio-8080-exec-11] Completed metadata extraction: reader: ContentAccessor[ contentUrl=store://2020/5/22/9/58/f3281f14-7ffb-4d91-a3b2-d0fc8de305d5.bin, mimetype=application/vnd.openxmlformats-officedocument.wordprocessingml.document, size=3075453, encoding=UTF-8, locale=en_US] extracter: org.alfresco.repo.content.metadata.PoiMetadataExtracter@2752d52e changed: {{http://www.github.com/abhinavmishra14/model/demo/1.0}originCreatedDate=2020-02-10T16:13:00Z, {http://www.alfresco.org/model/content/1.0}title=null, {http://www.alfresco.org/model/content/1.0}author=Abhinav, {http://www.github.com/abhinavmishra14/model/demo/1.0}originModifiedDate=2020-02-10T20:05:00Z}

Image metadata on share view details:

Image medata and original created/modified dates extratcted via TikaAutoMetadataExtracterImage medata and original created/modified dates extratcted via TikaAutoMetadataExtracter

PDF And Office metadata on share view details:

Metadata extracted via PdfBoxMetadataExtracterMetadata extracted via PdfBoxMetadataExtracterMetadata extracted via PoiMetadataExtracterMetadata extracted via PoiMetadataExtracter

Highlighted
Senior Member II

Re: How to preserve original docucment create and modified date during upload

Jump to solution

Hi,

If posssible then can you share demo that you have created because for Image,it is not working even I have used your code.

Thanks & Regard,
Sanjay
Highlighted
Moderator
Moderator

Re: How to preserve original docucment create and modified date during upload

Jump to solution

@sanjaybandhniya  Please share your contentmodel, share config, bean definition, extractor properties and log here. 

Highlighted
Senior Member II

Re: How to preserve original docucment create and modified date during upload

Jump to solution
Highlighted
Moderator
Moderator

Re: How to preserve original docucment create and modified date during upload

Jump to solution

@sanjaybandhniya  Find the demo project here:

https://github.com/abhinavmishra14/alfresco-metadataextraction-demo

I had an observation between community and enterprise versions. Examples i gave above works perfectly fine with enterprise versions of 5.2.x (i used 5.2.6) and 6.1.x(used 6.1), but properties files are not getting picked correctly (its some sort of intermittent behavior) on community editions. 

Only change i did is highlighted below for community edition and it picks up always corretly.

<property name="mappingProperties">
    <bean class="org.springframework.beans.factory.config.PropertiesFactoryBean">
	<property name="location">
	   <value>classpath:alfresco/module/${project.artifactId}/metadataextraction/TikaAutoMetadataExtracter.properties</value>
	</property>
    </bean>
</property>

On enterprise version both works fine, above path and below given path as well:

<value>classpath:alfresco/metadata/TikaAutoMetadataExtracter.properties</value>

This one also works on both versions:

<value>classpath:alfresco/extension/metadata/TikaAutoMetadataExtracter.properties</value>

https://github.com/abhinavmishra14/alfresco-metadataextraction-demo/blob/master/metadata-extractor-d...

I am not sure what difference the two type of versions (community and enterprise) has in terms of extension points, tried looking at source code but no clues. But good news is that the other path i shared above (available in demo project) works fine for both community and enterprise versions.

Hope this helps trim down your issue. 

View solution in original post