Alfresco Search Enterprise 3.2 consists of Alfresco Content Services, Elasticsearch Server and the Elasticsearch connectors. Further According to the official documentation there are number of prerequisites such as ActiveMQ, Postgresql Database and Transform Service. Please also note that it is not a must to have transform service running to extract general metadata.
In this post I will cover how we can Scale ES during re-indexing/ live indexing and when to use different ES connector jars.
2. Alfresco Search Enterprise (ASE)
Alfresco Content Services supports the Elasticsearch platform for searching within the repository using Alfresco Search Enterprise 3.2. Alfresco Search Enterprise module is consist of 6 jar files.
ASE Jar List
alfresco-elasticsearch-reindexing-3.2.0-app.jar: This is all-in-one jar file which index content, medatdata and path for existing content store.
However, this perticular jar comes with 3 parameters which we can configure according to the business requirement.
# Reindexing services execution
alfresco.reindex.metadataIndexingEnabled = true
alfresco.reindex.contentIndexingEnabled = true
alfresco.reindex.pathIndexingEnabled = true
Therefore if we wanted to reindex metadata only, you should pass the parmenters to the above command accordingly as below
Sample Search Queries to try Out:
For Metadata Search: cm:name:'test', cm:author:admin ,cm:title:'test'
For Path Search: PATH:"/app:company_home/st:sites/cm:test/cm:documentLibrary/*"
For Content Search: cm:content:’test’
There are 5 live indexing jars available in ES connector distribution zip.
alfresco-elasticsearch-live-indexing-3.2.0-app.jar : This is all-in-one jar file which index content, medatdata and path for realtime data which consist of all 4 live-indexing jar files specific to mediation, metadata, content, and path. Unlike with all-in-one reindex jar we do not have control over what we should index.
When to use other live indexing jars?
In the events that business do not have the requirement to use full text indexing(content indexing) and when deployinng at Scale.
To start alfresco-elasticsearch-live-indexing-mediation-3.2.0-app.jar run below command.
alfresco-elasticsearch-live-indexing-metadata-3.2.0-app.jar: Index metadata only. To start run below command.
alfresco-elasticsearch-live-indexing-path-3.2.0-app.jar: Index path only
alfresco-elasticsearch-live-indexing-content-3.2.0-app.jar : Index content only
3. Deploying at Scale
When designing highly available systems deploying at scale is essential. Hence below diagram shows most optimized way of designing high available architecture.
Live-Indexing: Deploying at Scale
There will be Single point of Failure in Mediation Component as it cannot be scaleup. Therefore, it is a must that we need Monitor the mediation component and run reindexing app for the specific period in case of a failure.
It can take a large amount of time when re-indexing a large repository using a single re-index process. Therefore, with below two approaches you can scale reindexing process vertically as well as horizontally.
3.2.1. Aapproach 1
In this approach we can have multiple EC2 instances to have horizontal scaling and inside each instance we can run multiple reindexing threads.
Setting Up Re-Indexer Instance
Copy alfresco-elasticsearch-connector-distribution-3.2 into each instance
We were running 6 threads on one instance and 5 threads on second instance. This can be change accordingly.
Run below code with unique port numbers and reindex.fromId and reindex.toId to run as many threads needed in a instance.
To fetch by IDS alfresco.reindex.jobName=reindexByIds: index nodes in an interval of database ALF_NODE.id column