- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Table of Contents
Links
Historical version of this page for 4.0 and 3.4
Introduction
In Alfresco Community Edition 201802-EA the OpenOffice transformers were replaced by the JodConverter transformers to provide better reliability in terms of communication and automatic restarts. Until this release JodConverter transformers had only been available to Enterprise customers, but in both cases they had been using the same LibreOffice process to produce the transformations. This page still contains references to OpenOffice, but the same approach to changing JodConverter applies. You will also see that the getProperties operation no longer returns anything about OpenOffice and the 'edition' properties that indicate JodConverter is only available in Enterprise have been removed. To help with the upgrade to ACE 201802-EA, the OpenOffice properties will still be used for the JodConverter as long as a JodConverter property has not been supplied. The JodConverter property takes precedence over any OpenOffice property of the same format.
In Alfresco 4.2d much of the configuration of transformers is done using Alfresco global properties. In the case of the Enterprise edition these may be changed dynamically via JMX without stopping the server. Prior to this it was possible to control Content Transformer limits to a more limited extent using Spring XML and a few linked Alfresco global properties (see Content Transformation Limits 4.0). The 4.2d release does away with the need for most of the Spring configuration and adds greater control, flexibility, functionality and also in the case of the Enterprise edition adds JMX operations to examine transformation logs, debug, statistics, run test transforms and discover what transformations are possible.
In previous versions transformation limits, supported transformations, unsupported transformations, explicit transformations could be defined in Spring. This approach has now been deprecated. If there are custom Spring files in existence, the system will dynamical set properties to achieve the same result on start up. See the Deprecated Spring Configuration section for more detail.
To explore the transformer configuration functionality, this pages talks about the JMX operations only available in the Enterprise edition, but it should be emphasised that normal alfresco-global.properties may be used in the Community edition to set the same configuration.
Using JConsole to access the Transformer JMX operations
JConsole may be used to access the managed bean (MBean) used to control the Transformation sub system. It may be found under Alfresco|ContentTransformer|Configuration|Operations.
- getProperties(listAll) Lists all transformer properties that are set.
- setProperties(propertyNamesAndValues) Adds or replaces transformer properties.
- removeProperties(String propertyNames) Removes transformer properties.
--- - getTransformationLog(n) Lists the latest entries in the transformation log.
- getTransformationDebugLog(n) Lists the latest entries in the transformation debug log.
--- - getExtensionsAndMimetypes() Lists all configured mimetypes and the primary file extension.
- getTransformerNames() Lists the names of all top level transformers.
--- - getTransformationsByExtension(sourceExtension, targetExtension, use) Lists all possible transformations sorted by source and then target mimetype extension.
- getTransformationsByTransformer(transformerName, use) Lists all possible transformations sorted by Transformer name.
--- - getTransformationStatistics(transformerName, sourceExtension, targetExtension) Lists the transformation statistics for the current node.
--- - testTransform(transformerName, sourceExtension, targetExtension, use) Transforms a small test file from one mimetype to another and then shows the debug of the transform.
getProperties
- getProperties(listAll)
- Lists all transformer properties that are set.
- - listAll if true, list both default and custom values, otherwise it includes only custom properties or modified standard values
# LOG and DEBUG history sizes
# ===========================
# Use small values as these logs are held in memory. 0 to disable.
# transformer.log.entries=50
transformer.debug.entries=22 # default=0
# Default transformer settings
# ============================
# content.transformer.default.count=100000
# content.transformer.default.errorTime=120000
# content.transformer.default.maxPages=-1
# content.transformer.default.maxSourceSizeKBytes=-1
# content.transformer.default.pageLimit=-1
# content.transformer.default.priority=100
# content.transformer.default.readLimitKBytes=-1
# content.transformer.default.readLimitTimeMs=-1
# content.transformer.default.thresholdCount=3
# content.transformer.default.time=0
# content.transformer.default.timeoutMs=120000
# Transformers with configuration settings
# ========================================
# Commented out settings are hard coded values for information purposes
# JodConverter
# ------------
# Transformations supported by OpenOffice/LibreOffice
content.transformer.JodConverter.priority=99 # default=110
# content.transformer.JodConverter.extensions.*.docm.supported=false
# content.transformer.JodConverter.extensions.*.docx.supported=false
# content.transformer.JodConverter.extensions.*.dotm.supported=false
# content.transformer.JodConverter.extensions.*.dotx.supported=false
...
# content.transformer.JodConverter.extensions.doc.pdf.maxSourceSizeKBytes=10240
# content.transformer.JodConverter.extensions.docm.pdf.maxSourceSizeKBytes=768
# content.transformer.JodConverter.extensions.docx.pdf.maxSourceSizeKBytes=768
# content.transformer.JodConverter.extensions.dotm.pdf.maxSourceSizeKBytes=768
# content.transformer.JodConverter.extensions.dotx.pdf.maxSourceSizeKBytes=768
# content.transformer.JodConverter.extensions.html.pdf.supported=false
# content.transformer.JodConverter.extensions.potm.pdf.maxSourceSizeKBytes=4096
# content.transformer.JodConverter.extensions.potx.pdf.maxSourceSizeKBytes=4096
...
# remoteServer
# ------------
# Unregistered transformer
# content.transformer.remoteServer.priority=40
# Transformers without extra configuration settings
# =================================================
# StringExtracter
# ---------------
# Only supports transformation of js and mimetypes starting with 'text/' to txt.
# MediaWikiParser
# ---------------
# Only supports mw to html
...
All transformation properties (listAll=true) - has been cut down
As can be see there are four sections although the sections will be discarded is they contain no entries. These are:
- LOG and DEBUG history sizes
- Properties used to turn on or off the capture of log and debug messages. As these are held in memory don't use very large numbers. A value of 0 turns off the capture. As can be seen the log value appears to be commented out which indicates that the default (50) value is being used. In the case of debug the standard default (0) has been replaced by a value of 22. The getTransformerLog and getTransformerDebugLog operations are used to access any captured log or debug.
- Default transformer settings
- Properties that provide defaults for all transformers. If the transformer does not provide more specific values the default is used.
- Transformers with configuration settings
- Properties that provide values specific to a named transformer. The transformer name may followed by a comment provided by the ContentTransformerHelper.getComments(boolean).
- Transformers without extra configuration settings
- Transformers that don't have any transformer properties. Included so that comments and implied properties can be included.
- A 'Complex' transformer defined in Spring has an implied 'pipeline' property.
- A 'Failover' transformer defined in Spring has an implied 'failover' property.
# LOG and DEBUG history sizes
# ===========================
# Use small values as these logs are held in memory. 0 to disable.
transformer.debug.entries=10 # default=0
# Transformers with configuration settings
# ========================================
# Commented out settings are hard coded values for information purposes
# JodConverter
# ------------
# Transformations supported by OpenOffice/LibreOffice
content.transformer.JodConverter.priority=99 # default=110
Custom transformation properties (listAll=false)
The output when listAll is set to false make it much simpler to find customisations.
setProperties
- setProperties(propertyNamesAndValues)
- Adds or replaces transformer properties.
- - propertyNamesAndValues to be set. May include comments but these are removed. To clear a custom values, set its value back to the default. To remove a custom property use removeProperties(...)
This method may be used to set transformer properties like those returned form getProperties. In fact it is possible to modify the getProperties output and feed it back into setProperties, as all the comments are ignored. For example if you modify the Custom transformation properties (above) so it looks like the following and pass it to getProperties, it will reset the overridden properties.
# LOG and DEBUG history sizes
# ===========================
# Use small values as these logs are held in memory. 0 to disable.
transformer.debug.entries=0
# Transformers with configuration settings
# ========================================
# Commented out settings are hard coded values for information purposes
# JodConverter
# ------------
# Transformations supported by OpenOffice/LibreOffice
content.transformer.JodConverter.priority=110
Reset property overrides
In addition to log and debug properties there are lots of transformer properties that may be set (see Transformation Properties).
removeProperties
removeProperties(String propertyNames)
- Removes transformer properties.
- - propertyNames to be removed. May include =<value> after the property name. The value is ignored. Only custom properties should be removed.
Transformation properties fall into two groups.
- Standard - defined by Alfresco.
- These may not be removed, but may be overridden with setProperties. Standard property overrides may then be cleared by setting them back to their defaults. To effectively remove a standard property, set a value that means they have no effect or by adding anther property. For example it is possible to disable a standard transformer content.transformer.remoteServer by adding the following property.
content.transformer.remoteServer.available=false
- Custom properties - not supplied by default by Alfresco.
- These may be removed. One way to remove all custom properties is to use getProperties(false), edit the output to remove all lines that end in # default <value> and then supply the result to removeProperties.
getTransformationLog
- getTransformationLog(n)
- Lists the latest entries in the transformation log.
- - n the number of entries to include. If blank all available entries are listed
The log consists of a single line per transformation request with the most recent at the top. The property transformer.log.entries is used to control the maximum number of entries captured in memory. If a value larger that 1000 is set this will be ignored and 1000 is used. The log is per node to avoid lots of traffic between nodes. The log is cleared on setting or removing a property and on start up. Each line is made up of the following components (some are optional):
- The completion data and time
- A number that identifies the transformation request number. There may be gaps as some requests to the Transformers sub system are for information only.
- The source extension. This is the main extension of the mimetype so may not match the file extension if there are multiple extensions for the mimetype
- The target extension for the mimetype.
- INFO or ERROR indicating if there was a problem
- The source file name (last part of the path only)
- The size of the source file
- How long the transformation took
- The name of the top level transformer used
- The type of the transformer:
- <Failover>> transformer tries a list of sub transformers one after the other until one works
- <Compound>> transformer is a pipeline of other transformers
- <Runtime>> transformer calls an external process
- <Proxy>> transformer is in another sub system so its type is not known
- No value indicates the transformer is none of the above and is a base transformer
- Failure reason
getTransformationDebugLog
- getTransformationDebugLog(n)
- Lists the latest entries in the transformation debug log.
- - n the number of entries to include. If blank all available entries are listed
The property transformer.debug.entries is used to control the maximum number of entries captured in memory. If a value larger that 100 is set this will be ignored and 100 is used. Just like the transformation log, the most recent entry is at the top, is per node and is cleared on setting or removing a property and on start up. Each entry is separated out unlike log4j messages which might overlap.
The debug log contains more information than the transformation log, including:
- selection of a transformer
- In request 21 (above) there are two transformers that may be used to transform a pdf to txt. The transformer called PdfBox is tried first (signified by **) as its priority is 50 where as the other transformer TikaAuto has a priority of 120. The highest priority is the one with the lowest number (> 0). The default transformer priority if not specified in properties is 100. Should the first one fail the next one in the list will be tried. The list of possible transformers also includes the average time taken by that transformer in the past. This is important when more than one transformer has the same priority (see Transformers with the same priority).
- the sequence of sub transforms used.
- The way to view this is to look at the numbers in the left column. The first number is the request number. The next number represents the execution of a transformer.
- The simplest case as shown in request 21 is the case where the selected transformer just runs. Had PdfBox failed, the error is reported and the next transformer would be tried and the second number would have been incremented.
- Request 20 shows an example of this. 20.1 is for the execution of the complex.PDF.Image<Failover>> transformer which has been selected. A failover transformer simply specifies the order of transformers to be used rather than letting the system choose it based on priority. In this case they are ImageMagic given the number 20.1.1 and double.ImageMagic given the number 20.1.2.
- If further levels exist, extra numbers are added. For example the double.ImageMagic just described has the reference 20.1.2 .1 and its first component transformer is given the reference 20.1.2 .1.1
- If the transformation has not finished the last line says '
Subset of extensions and mimetypes
getTransformerNames
- getTransformerNames()
- Lists the names of all transformers
Simply lists all transformers. This includes top level transformers that may be automatically selected. This also includes component transformers that have an available property value of false. Such transformers may only be used if referenced as a named component of a Complex (pipeline) or Failover transformer. They do however appear in the output of getProperties.
Subset of transformer names
getTransformationsByExtension
- getTransformationsByExtension(sourceExtension, targetExtension, use)
- Lists all possible transformations sorted by source and then target mimetype extension.
- - sourceExtension to be checked. If blank all source mimetypes are included
- - targetExtension to be checked. If blank all target mimetypes are included.
- - use or context in which to test the transformation ('doclib', 'index', 'webpreview', 'syncRule', 'asyncRule'...) or blank for the default.
Lists all the transformations that are available between given mimetypes for a given use. Either mimetype may be blank. The following example simply specified docx as the source. As can be seen, the count on the left lists the number of different combinations (note 40 appears twice as there are two transformers available from docx to tiff). The output also included the priority. A [---] priority value indicates that the transformer will not be selected unless it is a named component transformer. The transformer name is followed by the average time (0 ms) that has been taken on this node since the Transformer sub system started. The last part of the line indicated any file size limits (e.g. 256 KB). The transformer will not be used if the source file is larger than the specified size.
Subset of docx transformations
getTransformationsByTransformer
- getTransformationsByTransformer(transformerName, use)
- Lists all possible transformations sorted by Transformer name
- - transformerName to be checked. If blank all transformers are included
- - use or context in which to test the transformation ('doclib', 'index', 'webpreview', 'syncRule', 'asyncRule'...) or blank for the default.
Lists all the transformations that a transformer may perform or all transformers if that parameter is not specified for a given use. The following example included all (42) transformers. As can be seen, the count on the left lists the number of transformers and the next number counts the number of different combinations that transformer does. The output also included the priority and any file size limits (e.g. 256 KB).
Subset of transformations each transformer can perform
getTransformationStatistics
- getTransformationStatistics(transformerName, sourceExtension, targetExtension)
- Lists the transformation statistics for the current node.
- - transformerName to be checked. If blank all transformers are included
- - sourceExtension to be checked. If blank all source mimetypes are included
- - targetExtension to be checked. If blank all target mimetypes are included
List statistics about the transformations that have taken place on the current node (not shared across the cluster) since the Transformer sub system started. This may be restricted by transformer source and target extensions. If left blank all are considered. The following extracts are tagen from the output when all values had been left blank. The first line shows that there have been 60 top level transformation requests (does not include component transformations performed as part of a higher level transform), no errors and the average time was 1490 ms. The following lines show similar information between specific mimetypes regardless of the transformer used and does include component transformers.
Cut down statistics on the system as a whole
The following output was further down and shows similar data for each transformer in turn and also includes component transformers. In the case of the HtmlParsert transformer there is no summary line (* *) as it has only even been asked to transform between html and txt.
TransformerStatsTransformer.JPG
Cut down statistics by transformer
testTransform
- testTransform(transformerName, sourceExtension, targetExtension, use)
- Transforms a small test file from one mimetype to another and then shows the debug of the transform, which would indicate if it was successful or even if it was possible.
- - transformerName to be used. If blank the ContentService is used to select one.
- - sourceExtension used to identify the mimetype
- - targetExtension used to identify the mimetype
- - use or context in which to test the transformation ('doclib', 'index', 'webpreview', 'syncRule', 'asyncRule'...) or blank for the default.
Performs a transformation of a source type to a target type using limits for a specific use. The transformer may be specified or if left blank (the normal case) so will be selected automatically. The following example is from docx to swf. As can be seen there is only one transformer available and it is only able to handle files up to 256 KB.
Test transform from docx to swf
There are test files for the following extensions: 3g2 3gp acp asf avi bmp doc docx eml eps flv gif html ibooks jpg m4a m4v mov mp3 mp4 mpg msg odf odg odp ods odt ogg ogv otg otp ots ott pdf png ppt pptx pub sda sdc sdd sdw smf sxc sxd sxi sxm sxw tar tiff txt vdx vsd webm wma wmv wpd xls xlsx xml zip
Transformation Properties
Types
The following property types may be set (bold used for the more important ones).
- priority of the transformation.
- Using times rather than priority
- thresholdCount the number of transformations attempted before average time is used to select the transformer
- count sets the count of transformations on start up to a value other than 0
- time sets the average transformation time on start up to a value other than 0
- errorTime a false time given to a transformation if it fails
- available indicates if the transformation may be automatically selected.
- supported indicates which source and target mimetypes are supported
- Dynamic transformers
- pipeline that uses intermediate mimetypes and sub transformers
- failover that defines the order of transformers to be tried
- Transformation Limits
- maxSourceSizeKBytes the maximum source file size that will be attempted.
- timeoutMs a timeout on reading data from the source file.
- maxPages the maximum number of pages that may be read (or created) before an Exception is thrown.
- readLimitKBytes the read limit in terms of size after which end-of-file (EOF) is returned.
- readLimitTimeMs the read limit in terms of time after which end-of-file (EOF) is returned.
- pageLimit the read (or creation) limit in terms of pages.
Levels
To avoid having to supply too many properties, values may specified at different levels and default to deeper levels.
- content.transformer.default system wide defaults
- content.transformer.default.extensions.<source>.<target> system wide defaults between mimetypes (using extensions)
- content.transformer.default.mimetypes.<source>.<target> system wide defaults between mimetypes
- content.transformer.<transformer name> default transformer values
- content.transformer.<transformer name>.extensions.<source>.<target> values between mimetypes (using extensions) for a transformer
- content.transformer.<transformer name>.mimetypes.<source>.<target> values between mimetypes for a transformer
Uses
Transformation limits may also be defined on a per use basis by supplying a suffix for different uses.
- content.transformer...<transformer limit>.use.index a limit used when indexing content
- content.transformer...<transformer limit>.use.webpreview a limit used when previewing content in Share
- ...
Using priority to select a transformer
Each transformer may be given a priority which is used to determine the order in which transformers are tried. On failure the next highest one is tried. Prior to 4.2d this was based on the time taken to perform a transformation (see Using time to select a transformer which still applies if there are multiple transformers with the same priority) and a concept of 'explicit' transformations. The default transformer configuration in 4.2d has defined priorities so that there should only ever be one transformer with the highest priority.
Prior to 4.2d transformations marked as explicit would result in all non explicit transformers being ignored when it came to selection. The concept has now been replaced by such transformations being given a priority of 50 which is higher than the default of 100. The highest priority is 0.
Priorities may be defined at any level but normally are only defined in the two transformer levels as content.transformer.default.priority=100 is defined by default. There may be different priorities for different source and target mimetypes and for the transformer as a whole as can be seen in default configuration for PdfBox in the following example. The default configuration also includes a priority for the Alfresco Remote transformation server content.transformer.remoteServer.priority=40 so that it will be used if installed.
# iWorksQuicklooks
# ----------------
# Only supports key, numbers or pages to jpg or pdf
# content.transformer.iWorksQuicklooks.extensions.key.jpg.priority=50
# content.transformer.iWorksQuicklooks.extensions.key.pdf.priority=50
# content.transformer.iWorksQuicklooks.extensions.numbers.jpg.priority=50
# content.transformer.iWorksQuicklooks.extensions.numbers.pdf.priority=50
# content.transformer.iWorksQuicklooks.extensions.pages.jpg.priority=50
# content.transformer.iWorksQuicklooks.extensions.pages.pdf.priority=50
# PdfBox
# ------
# Only supports pdf to txt, html, xhtml or xml
# content.transformer.PdfBox.priority=110
# content.transformer.PdfBox.extensions.pdf.txt.priority=50
# remoteServer
# ------------
# Unregistered transformer
# content.transformer.remoteServer.priority=40
Use of priorities
Using time to select a transformer
When there are multiple transformers with the same priority the system will select one of them based on the time taken in the past. This was the basic approach take prior to 4.2d for all transformers. This resulted a non deterministic selection of transformers once each transformer had be run once. After one transformation none of the transformers would have an average transformation time of 0 ms. This was also recorded per transformer. If there was an error an error time of 2 minutes would be recorded. The idea being that it would make it less likely that the transformer would be selected again. As a result if the first file transformed was large or failed it would be unlikely that the transformer would ever be selected again even for different mimetypes.
In 4.2d the average time is still recorded, but for the combination of transformer, source mimetype and target mimetype. See getTransformationStatistics. There is also a thresholdCount property that defines the number of transformations that must be performed before the average time is used. This may be defined at any level but there is a system wide default content.transformer.default.thresholdCount=3. An error time (in milliseconds) is still used on failure as defined by an errorTime property. Again this may be defined at any level but there is a system wide default defined content.transformer.default.errorTime=120000.
As the average time statistics are cleared on restart of the Transformer sub system which takes place on changing or removing a property it is also possible to initialise the average time and the number of transformations performed by setting time and count properties at any level. This might allow a system to be configured to switch to another transformation if the normal transformer unexpectedly performs worse than expected.
content.transformer.default.errorTime=600000 // 10 minutes
content.transformer.myTransformer.count=1000000 // Use a high number so that the average time does not change much if used.
content.transformer.myTransformer.time=10000 // 10 seconds
Time properties
Disabling a transformer
Transformers may only be selected if they are marked as available. If not available they are only used if referenced by name as a component transformer of either a failover of complex (pipeline) transformer. The following is an example of a standard transformer called double.ImageMagick that is only ever used as a component. The default value for availability is true so this property need only be defined for unavailable transformers. This type of property may be used to turn off standard transformers if a custom transformer has been added in its place. The priority of unavailable transformers appears to be unset [--] when viewed via getProperties even if a priority has been defined as it has no meaning because it is not used to select the transformer.
# double.ImageMagick
# ------------------
# content.transformer.double.ImageMagick.pipeline=ImageMagick|png|ImageMagick
# content.transformer.double.ImageMagick.available=false
Example of a transformer unavailable by default
The other way to turn off a transformer is to give it a lower priority (higher number) than another transformer that can perform the same transformations. However it will still be used if the other transformer fails or has limits that restrict its use.
Supported transformations
A transformer may define in Java the transformations of mimetypes it supports. This may be further restricted by supported properties defined at either transformer or system wide levels and may include * as a wildcard for a source or target mimetype. Properties using either extensions or mimetypes may be used.
For backward compatibility (which had both supported and unsupported lists), two lists are still maintained so that they may be combined together in the same way. One for supported (supported=true) and one for unsupported (supported=false). As a result there is a third case when there is no supported property.
- When neither list exists (there are no supported properties) the transformations are not further restricted.
- When combining the two lists the unsupported list overrides the supported list in order to restrict the transformer.
content.transformer.default.extensions.docx.*.supported=false // disable all transformations from docs by default
content.transformer.xyz.extensions.*.pdf.supported=true // the xyx transformer is restricted to only those going to pdf
content.transformer.abc.extensions.pdf.png.supported=false // the abc transformer is restricted to only those from pdf to any other type except png
content.transformer.abc.mimetypes.application/pdf.*.supported=true
Supported examples
Pipeline transformers
A pipeline transformer or Complex transformer is made up of component transformers that transform the source content to the target via intermediate mimetypes. The component transformers are normally named but an * may be used to indicate that any available (see Disabling a transformer) transformer may be used. As can be seen below further properties may be defined. As a result it is possible to dynamically create transformers.
# JodConverter.Html2Pdf
# ---------------------
# content.transformer.JodConverter.Html2Pdf.pipeline=JodConverter|odt|JodConverter
# content.transformer.JodConverter.Html2Pdf.extensions.html.pdf.priority=50
# content.transformer.JodConverter.Html2Pdf.extensions.html.pdf.supported=true
Example pipeline transformer
Failover transformers
A failover transformer defines the order in which its component transformers should be tried, rather than using the normal priority approach. The component transformers must be named and normally are not all available (see Disabling a transformer). If one fails or is unavailable (perhaps because of transformer limits) the next one is tried. As with pipeline transformers further properties may be defined and it is possible to dynamically create them.
# JodConverter.2Pdf
# -----------------
# content.transformer.JodConverter.2Pdf.available=false
# content.transformer.JodConverter.2Pdf.failover=JodConverter|JodConverter.Html2Pdf
# content.transformer.JodConverter.2Pdf.priority=150
# content.transformer.JodConverter.2Pdf.extensions.*.pdf.supported=true
Example failover transformer
Transformer Limits
Transformation limits are generally used to protect the server from attempting transformations that consume too many resources or don't return quickly enough when it is know that a user will be waiting for the result.</br>
There are six possible limits. As can be seen there are really three pairs of values (time, size and page). Only one of each pair may be set at any level.
- Maximum source size (KB) If a source file is larger than the specified size the transformation will not be attempted.
- Timeout (ms) A timeout on reading data from the source file.
- Maximum pages The maximum number of pages that may be read (or created) before an Exception is thrown.
- Read limit size(KB) A read limit in terms of size after which end-of-file (EOF) is returned.
- Read limit time (ms) A read limit in terms of time after which end-of-file (EOF) is returned.
- Page limit A read (or creation) limit in terms of pages.
Timeout and Read limit time only works with transformers that don't bulk read their source data, as these limits are enforced by a modified InputStream that either throws and Exception or returned EOF early. The OpenOffice and JOD transformers cannot make use of these options currently as they create temporary files to work with, but in the case of the JOD transformers, they also have their own timeout set to two minutes by default.
With Read limit size and Read limit time, the source file is effectively truncated, so should only be used with mimetypes where it is acceptable to truncate the file without resulting in an invalid format error.
The Maximum page and Page limit values are only meaningful to transformers that know about pages. Initially this was only the TextToPdfContentTransformer. In such cases, when a transformer is asked if it can perform a transformation it will ignore the 'Maximum source size' value if either 'page' value is set. This allows the transformer to affectively truncate the source file, so the actual size is not meaningful. Although the Page limit implies a number of pages, transformers may interpret the value in their own terms, such as characters, words, blocks or even pages.
content.transformer.default.timeoutMs=180000
content.transformer.default.txt.*.maxSourceSizeKBytes=1048576
content.transformer.JodConverter.maxSourceSizeKBytes=102400
content.transformer.JodConverter.extensions.doc.pdf.maxSourceSizeKBytes=10240
content.transformer.JodConverter.extensions.doc.pdf.maxSourceSizeKBytes.use.asyncRule=102400
Example of setting transformation limits at different levels and uses
Selecting the limits
Unfortunately there are no rules about selecting the correct transformation limits to be used.
- However it is generally worth considering setting limits for worst case test files (for example a document that contains just text and no images) so that it is not possible for a user to upload a file that will consume so many resources that other users will be adversely effected. Transformation limits were originally introduced to avoid transformations of large excel files to text as there were consuming 100% CPU.
- Another thing to consider is what would be considered a reasonable time for a user to wait for the result and if there is a timeout on the user interface that would make returning a result impossible anyway. This is why some of the JodConverter and OpenOffice limits exist to allow a Share preview (using swf) to take place in less than 2 minutes.
- Setting general system wide limits is a good idea if there are multiple transformers that can perform the same transformation unless it is known that the performace characteristics are dramatically different.
Setting limits at different levels
Limits may be defined at a number of levels with values defaulting from one level to the next if a more specific value has not been specified:
- System wide
- System wide between mimetypes
- Per transformer
- Between mimetypes for a given transformer
content.transformer.default.timeoutMs=120000 // system wide
content.transformer.default.extensions.*.txt.maxSourceSizeKBytes=102400 // system wide between mimetypes
content.transformer.JodConverter.maxSourceSizeKBytes=4096 // per transformer
content.transformer.JodConverter.extensions.doc.pdf.maxSourceSizeKBytes=10240 // between mimetypes for a given transformer
- content.transformer.<transformer name>.<limit>
- System wide limits are defined by using a property with a transformer named 'default'
- content.transformer.<transformer name>.<extensions>.<source>.>target>.<limit>
content.transformer.<transformer name>.<mimetypes>.<source>.>target>.<limit>
- Limits between mimetypes are defined using either the file extension or the mimetype. If one property is defined using extensions and another using mimetypes and both describe the same mimetype the extensions value is ignored. extensions and mimetypes may also use '*' as a wild card to represent zero or more characters. For example: 'image/*' or 'doc*'.
Setting limits based on use
In addition setting limits at different levels it is also possible to specify values for a specific use. Currently the defined uses are:
- index when extracting text so that it may be indexed
- doclib for icons in Share's Document Library (this is the name of the thumbnail)
- webpreview for Share's Preview page when swf is used (this is the name of the thumbnail)
- imgreview Share's Preview page when png is used (swf is not available) (this is the name of the thumbnail)
- asyncRule when a background rule is executing
- syncRule when a foreground rule is executing.
The following would disable all thumbnail transformations for Share's Document Library.
content.transformer.default.maxSourceSizeKBytes.use.doclib=0
These properties are set when there is a need to open up or close down a limit for a specific use. For example indexing which is run in background could be allowed more time or larger source files than a foreground task. Depending on the level of properties with use values (system wide default, transformer default or between mimetypes for a transformer) properties that don’t have a use value may or may not be considered when a transformation is called for that specific use case. This avoid having to duplicate lots of properties.
If a system wide property with a use value is defined only properties that include the same use are considered.
content.transformer.default.extensions.pdf.swf.maxSourceSizeKBytes.use.index=777
content.transformer.TikaAuto.timeoutMs.use.index=120000
// The following are all ignored when a transform is performed for the indexer.
content.transformer.default.extensions.doc.txt.maxSourceSizeKBytes=1234
content.transformer.TikaAuto.timeoutMs=60000
content.transformer.default.extensions.pdf.swf.maxSourceSizeKBytes=6666
content.transformer.default.extensions.pdf.swf.maxSourceSizeKBytes.use.webpreview=5555
If a transformer default property with a use value is defined, none of properties for that transformer without the same use value will be considered.
content.transformer.TikaAuto.timeoutMs.use.index=120000
content.transformer.OOXML.timeoutMs.use.index=120000
content.transformer.default.extensions.doc.txt.maxSourceSizeKBytes=1234
// The following are ignored when a transform is performed for the indexer.
content.transformer.TikaAuto.timeoutMs=60000
content.transformer.TikaAuto.extensions.pdf.swf.maxSourceSizeKBytes=5555
content.transformer.default.extensions.pdf.swf.maxSourceSizeKBytes.use.webpreview=5555
If a property with a use value is defined for a specific transformer and source and target mimetype combination, it will be used in preference to the same property combination without the use value.
content.transformer.TikaAuto.extensions.pdf.swf.maxSourceSizeKBytes.use.index=5555
content.transformer.TikaAuto.timeoutMs.use.index=120000
content.transformer.default.extensions.doc.txt.maxSourceSizeKBytes=1234
// The following is ignored when a transform is performed for the indexer.
content.transformer.TikaAuto.extensions.pdf.swf.maxSourceSizeKBytes=3333
Black list content so it is not transformed
5.1a, 4.2.5 and 5.0.2 allow selected content to be black listed so that they are not transformed. These properties should only to be used as a temporary measure to allow problematic content to then be removed from the repository. It is not designed to scale to more than a handful of entries. Typically it is used to exclude content that that has triggered a transformer performance issue.
Blacklist properties follow the same pattern as other transformer properties so that they may be specified system wide, per transformer, between specific mimetypes (or extensions) even by use. As can be seen in the following example, the property contains a list of node refs.Only the id part of the node ref is used, so that content is still excluded if moved to another content store (such as the Archive store). The following is an example of a system wide property that stops two node refs using any pdf to txt transformer.
content.transformer.default.extensions.pdf.txt.blacklist=workspace://SpacesStore/4657565c-b1b0-478e-9f67-aa9f13af47f7,archive://SpacesStore/a81c1db0-5eaf-4790-99b6-1889903ae316
There are a number of ways to obtain the node refs that need to be specified, but one of the simplest is to look at the TransformerDebug, which was probably used to identify the performance problem in the first place.
More detail
Combining Limits
When a transformer is used to perform a rendition (create an image or thumbnail) the thumbnail definition may also have limits. These are combined with those of the transformer. A similar thing also takes place when limits for a complex (pipeline) transformer and the limits of its first sub transformer. The smaller value of each limit pair (time, size or page) is used. The other value in the pair becomes unlimited, as it should never be reached. An unlimited limit value, represented by a negative number (-1 is normally used) also loses out to a value greater than or equal to 0.
For example if we combine a Maximum Source Size of 1 MB with a Read Limit size value of 3 MB, the combined limits have a Maximum Source Size of 1 MB and an unlimited Read Limit size.
As already noted, when a transformer that 'understands' page limits is asked if it can perform a transformation it will ignore the 'Maximum source size' value if either 'page' value is set. It does this because the page limit is effectively truncating the source file, so the actual size is not meaningful.
In 4.2d transformer level limits act as defaults for transformer, source mimetype and target mimetype limits. Prior to 4.2d they would have been combined (as just described). The reason was that prior to 4.2d, limits were defined in Spring and it was quite difficult to add new ones so combining rather that defaulting was the better option. As adding new limits is simple in 4.2d, defaulting is a better option as it is much simpler to explain. In practice combining Alfresco defined limits prior to 4.2d was identical to the new default method as the more limiting values were always at the mimetype level.
Complex transformer Limits
It is generally a good idea to set 'Maximum source sizes' on complex transformers and their individual component transformers (no need to do the first one). If a 'Maximum source size' is not set on a complex transformer, there is a risk that a number of sub transformations will take place only to fail laster. For example, it might be a good idea to set a I MB source file size limit on a complex transformers between xls and swf via pdf, if the transformer between pdf and swf has a 1.25 MB limit. The reason is that the transformation from xls to pdf tends to create a pdf only slightly bigger than the original xls. 1 MB is a 'safe' value that avoids the first transformation being performed only to fail on the second.
Failover transformer limits
In addition to any limits set on a failover transformer, the 'Maximum source size' is also combined with all component transformers. The highest component value is effectively combined with the failover transformer's own value.
Transformation Rules
Rules that include transformation will fail if there are no transformers available or the transformer fails for some other reason. Historically rules have failed if the transformation failed. Unless one is careful setting limits may make these rules fail more frequently.
It should be noted that rules that run synchronously with the initial action such as copying a file into a folder run within the same transaction, so if the transformation fails because there are no transformers available, the file will not appear to be copied into the folder. Asynchronous rules however run in their own transaction so the file will be copied to the folder but the transformation will not appear to take place.
Deprecated Spring Configuration
In previous versions, transformation limits, supported transformations, unsupported transformations, explicit transformations could be defined in Spring. This approach has now been deprecated. If there are custom Spring files in existence, the system will dynamical set properties to achieve the same result on start up. There will be an error message logged to show the properties added.
ERROR [...ContentTransformerHelper] In order to support dynamic setting of transformer options, Spring XML configuration
ERROR [...ContentTransformerHelper] is no longer used to initialise these options.
ERROR [...ContentTransformerHelper]
ERROR [...ContentTransformerHelper] Your system appears to contains custom Spring configuration which should be replace by
ERROR [...ContentTransformerHelper] the following Alfresco global properties. In the case of the Enterprise edition these
ERROR [...ContentTransformerHelper] values may then be dynamically changed via JMX.
ERROR [...ContentTransformerHelper]
ERROR [...ContentTransformerHelper] content.transformer.customTransformerUsingDeprecatedProperties.extensions.csv.pdf.supported=true
ERROR [...ContentTransformerHelper] content.transformer.customTransformerUsingDeprecatedProperties.extensions.xml.pdf.supported=true
ERROR [...ContentTransformerHelper] content.transformer.customTransformerUsingDeprecatedProperties.maxSourceSizeKBytes=10240
Example start up log message when there is custom Spring configuration
<bean id='transformer.customTransformerUsingDeprecatedProperties'
class='org.alfresco.repo.content.transform.TextToPdfContentTransformer'
parent='baseContentTransformer' >
<property name='supportedTransformations'>
<list>
<bean class='org.alfresco.repo.content.transform.SupportedTransformation' >
<property name='sourceMimetype'><value>text/csv</value></property>
<property name='targetMimetype'><value>application/pdf</value></property>
</bean>
<bean class='org.alfresco.repo.content.transform.SupportedTransformation' >
<property name='sourceMimetype'><value>text/xml</value></property>
<property name='targetMimetype'><value>application/pdf</value></property>
</bean>
</list>
</property>
<property name='maxSourceSizeKBytes'><value>10240</value></property>
</bean>
Example custom transformer Spring bean using deprecated properties
Using the deprecated approach will be fine as long as other custom properties take this into account. The properties dynamical set on start up will be reset if changes are made via JMX which restarts the transformer subsystem.
Note: The concept of explicit transformations don't exist any more but are simulated by setting a transformer priority of 50 (the default is 100) and adding supported transformer properties. Setting a higher priority (