Introducing Alfresco Transform Service - new with ACS 6.1

cancel
Showing results for 
Search instead for 
Did you mean: 

Introducing Alfresco Transform Service - new with ACS 6.1

janv
Alfresco Employee
5 9 12.9K

Here's a brief overview of the new Alfresco Transform Service that is being introduced as an option for Alfresco Content Services (ACS) 6.1. It also includes evolutionary changes to the ACS Content Repository.

For those that missed DevCon 2019 you can also refer to following slides Transforming Transformers in ACS 6.1 and beyond.
 
alfresco-transform-service-devcon-2019-1-638.jpg
 
The video can be found here.

 
c658b74ca15f9a50cb671b9905c23890a9ab84c5.jpg
I will try to come back and update this blog post with links to other related slides & videos such as:
 
  • Architectural Changes in Repo 6.1 & beyond slidesAlex Mukha & Stefan Kopf )
  • Transform Graph ( Lucian Tuca )
  • Porting Custom Transformer to new Transform Service slides ( Alan Davis‌ )
If you've been closely following the ACS reference deployment projects, you may already have noticed the new Transform Service. This is available now alongside the ACS 6.1 release.
 
helm-components.png
 
The Transform Service is an optional component that is enabled by default in the ACS reference deployments, whether using Docker Compose, Helm / Kubernetes or the AWS EKS stack.
 
Alternatively, you can access via ACS download trial.
 

Overview of the Transform Service
 
The following overview is extracted from the documentation. Thanks Adelaide Nxumalo.
 
The Transform Service handles the essential transforms, such as Microsoft Office documents, images, and PDFs. These include PNG for thumbnails, PDF and JPEG for downloads and previews.
 
The following diagram shows a simple representation of the Transform Service components:
  
transform-simple-architecture.png

This shows an example implementation of how you can deploy into AWS, using a number of managed services:

  • Amazon EKS - Elastic Container Service for Kubernetes
  • Amazon MQ - Managed message broker service for Apache ActiveMQ
  • Amazon EFS - Amazon Elastic File System
You can replace the AWS services (EKS, MQ, and EFS) with a self-managed Kubernetes cluster, ActiveMQ (configured with failover), and a shared file store, such as NFS.

The main components of the Transform Service are:

  • Content Repository (ACS): This is the repository where documents and other content resides. The repository produces and consumes events destined for the message broker (such as ActiveMQ or Amazon MQ). It also reads and writes documents to the shared file store.
  • ActiveMQ: This is the message broker (either a self-managed ActiveMQ instance or Amazon MQ), where the repository and the Transform Router send image transform requests and responses. These JSON-based messages are then passed to the Transform Router.
  • Transform Router: The Transform Router allows simple (single-step) and pipeline (multi-step) transforms that are passed to the Transform Engines. The Transform Router (and The Transform Engines) run as independently scalable Docker containers.
  • Transform Engines: The Transform Engines transform files sent by the repository. Here are some example transformation for each Transform Engine (this is not an exhaustive list):
    • LibreOffice (e.g. docx to pdf)
    • ImageMagick (e.g. resize)
    • Alfresco PDF Renderer (e.g. pdf to png)
    • Tika (e.g. docx to plain text)
  • Shared File Store: This is used as temporary storage for the original source file (stored by the repository), intermediate files (or multi-step transforms), and the final transformed target file. The target file is retrieved by the repository after it's been processed by one or more of the Transform Engines.

For more details, please refer to the Transform Service documentation:

Please try out try out ACS 6.1 reference deployments with the new Transform Service.

We look forward to your feedback. Please free to contact Preeti Nirwal‌ or myself.

Thanks,

Jan Vonka‌ & the INT team (Integrations n Transformations)

 

9 Comments
lcolorado
Partner

When there is more than one Alfresco server (ACS), are all them pointing to the same ActiveMQ broker?

janv
Alfresco Employee

Hi Luis,

Can you please describe your use-case scenario in a little more detail ? 

Do you mean a single clustered instance of ACS with more than one Repository server / node / container ? If so, then yes, each Repo server (in the cluster) will connect to the same ActiveMQ broker.

Or, do you mean more than one instance ("installation") of ACS ? If so, then no, each ACS instance should connect to its own ActiveMQ broker (at least for current Transform Service 1.0.x).

Hope this helps.

Regards,

Jan

lcolorado
Partner

Thanks for responding!

So that means that there should be only one broker per repository, regardless of the number of ACS servers.

Is the new transformation service used to extract text for Solr in Alfresco 6.x? In Alfresco 5.x, if a new Word document is ingested, Solr would request the text content, then Alfresco would transform it to text using LibreOffice, and would return the text to Solr.

janv
Alfresco Employee

You're welcome :-)

Yes, you only require one broker for one clustered repo. It can be used by Transform Service, Desktop Sync Service, etc.

Regarding your other question: as of ACS 6.1, the Search Service does not (yet) use the asynchronous event-driven Transform Service. The current search index content tracker uses a dedicated synchronous call to the repo to perform the "to text" transform.

We plan to (* usual caveat in terms of forward looking statements) provide a new asynchronous mechanism for search index content tracker - hopefully in next / future ACS release.

Regards,

Jan

binduwavell
Partner

Hi Jan,

Thanks so much for this thread and your engagement on it!

I want to clarify something. When Solr needs to get a text rendition for say a word document. It uses the dedicated/synchronous webscript call to fetch the text rendition. Assuming only external transformation is enabled, I would assume the webscript would ultimately be sending out a transform request on AMQ and receiving a transform response via AMQ and then the webscript would return with the text rendition. If this is wrong, can you explain how this works differently than it used to work?

I have not had a chance to review the slides or video, so if details are there please tell me and I'll go dig through them. 

I was wondering if you could give any detail about the request/response flow through AMQ.

Say I have some code running in the repo and I request a transformation in a synchronous way. How is the async AMQ request/response handled and then marshaled back into the blocking transformation request?

-- Bindu

jpotts
Professional

Many customers are not on AWS and are not deploying with Docker, Docker Compose, or Kubernetes. This means they have to manually install everything, which is a huge pain. When installing using this method, by default, the transformation service is disabled. Where is the documentation that describes how to install/enable the transformation service under this deployment scenario?

janv
Alfresco Employee

Hi Jeff, 

Many thanks for the feedback. I have passed it on to Product Management. The Transform Service itself is currently part of ACS Enterprise so please feel free to contact Support (and PMs) directly with your specific customer requirements, if you haven't already.

In the meantime, there are few things to consider ...

In theory, each (micro-) service can be run as a process (Spring Boot). Hence, Transform Router (T-Router), Shared File Store and each of the core Transform Engines (T-Engines). You can run multiple instances of each process. In this scenario, the Shared File Store would need to be load-balanced.

For the T-Engines that call external transformers, you would also need to setup those transformers (if not already installed), eg. LibreOffice and ImageMagick. You can refer back to existing ACS Repository docs for hints.

As a pure PoC (unsupported) you might wish to also take a look at Alfresco 6.1 in Azure (VM based deployment) which runs the Transform Service in VMs rather than Containers.

And finally, with upcoming ACS 6.2 (including next release of ACS Community) we have also open-sourced the core T-Engines which might help you to get started ? More info to follow, but for now see:

Hope the above helps.

Regards,

Jan

janv
Alfresco Employee

Quick follow-up: ACS Community 201910 EA is now available with core T-Engines. Please refer to:

https://hub.alfresco.com/t5/alfresco-content-services-hub/alfresco-community-edition-201910-ea-relea...

janv
Alfresco Employee

Please see new blog https://hub.alfresco.com/t5/alfresco-content-services-blog/quick-reference-for-transformers-in-acs-c... ( from @angelborroy ) for building custom T-Engines (Transform Engines) to deploy and run with either ACS Community 201911 GA &/or ACS Enterprise 6.2.