Performance Testing AWS Elasticsearch(Version 7.10) with ACS 7.2.0, ES Connector 3.1.0

amitsingh · ‎27 Mar 2023

AWS Elasticsearch Setup With ACS

Elasticsearch 7.10 deployment using AWS Elasticsearch service

Elasticsearch was deployed in same VPC as ACS with Security group allowing access to all incoming traffic from ACS, Database, Transform Service, Share File Store etc. necessary services.
Elasticsearch infrastructure will depend on the data volume, expected user load, cost etc. factors

Certificate generation for Elasticsearch

Allow 443 post access between Security group of ACS and Elasticsearch
Connect with ACS instance and generate elasticsearch-certificate.cer file in /home/ec2-user using below command

sudo echo | openssl s_client -servername <domain name of ES without https://> -connect <domain name of ES without https://>:443 2>/dev/null | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > elasticsearch-certificate.cer

Import generated certificate to /opt/alfresco-content-services/jdk-11.0.2/lib/security/cacerts and /opt/alfresco-share-services/jdk-11.0.2/lib/security/cacerts

sudo /opt/alfresco-content-services/jdk-11.0.2/bin/keytool -import -alias elasticsearch -keystore /opt/alfresco-content-services/jdk-11.0.2/lib/security/cacerts -file  /home/ec2-user/elasticsearch-certificate.cer -storepass changeit -noprompt
sudo /opt/alfresco-share-services/jdk-11.0.2/bin/keytool -import -alias elasticsearch -keystore /opt/alfresco-share-services/jdk-11.0.2/lib/security/cacerts -file  /home/ec2-user/elasticsearch-certificate.cer -storepass changeit -noprompt

Update the cacert path in the alfresco-global.properties (/opt/alfresco-content-services and /opt/alfresco-share-services) under the keystore location, as stated below

encryption.ssl.truststore.location=/opt/alfresco-content-services/jdk-11.0.2/lib/security/cacerts

Update the cacerts path in setenv.sh (/opt/alfresco-content-services/tomcat/bin) in variable JAVA_OPTS= after -Dalfresco.home

-Djavax.net.ssl.trustStore=/opt/alfresco-content-services/jdk-11.0.2/lib/security/cacerts

Update the cacerts path in setenv.sh (/opt/alfresco-share-services/tomcat/bin) in variable JAVA_OPTS= after -Dalfresco.home

-Djavax.net.ssl.trustStore=/opt/alfresco-share-services/jdk-11.0.2/lib/security/cacerts

Restart tomcat and share_tomcat (we are running Content Service & Share Service on Tomcat servers in EC2 instance)

sudo service tomcat restart
sudo service share_tomcat restart

Run Curl command to verify if ACS and ES are able to communicate, as shown below

Changes in Alfresco Global Properties

Solr settings to be retained to allow access to Search Service on ACS UI

### Solr ###
index.subsystem.name=solr6
Gdir.keystore=${dir.root}/keystore/
dir.keystore=/opt/alfresco-content-services/keystore/metadata-keystore

Elasticsearch sub-system to be included

# Set the Elasticsearch subsystem
index.subsystem.name=elasticsearch
# Elasticsearch index properties
elasticsearch.indexName=alfresco
elasticsearch.createIndexIfNotExists=true
# Elasticsearch server properties
#elasticsearch.protocol=https
elasticsearch.host=https://<elasticsearch host name>.amazonaws.com
elasticsearch.port=443
elasticsearch.baseUrl=/

Keystore to be updated to include the the generated ES certificate

### Keystore Properties ###
encryption.keystore.type=JCEKS
encryption.ssl.truststore.location=/opt/alfresco-content-services/jdk-11.0.2/lib/security/cacerts

Restart tomcat and share_tomcat (we are running Content Service & Share Service on Tomcat servers in EC2 instance)

sudo service tomcat restart
sudo service share_tomcat restart

ACS Admin Console Update

Following changes are to made in Search Service (Admin Console) after all steps stated above are completed.Restart tomcat and share_tomcat (we are running Content Service & Share Service on Tomcat servers in EC2 instance)

sudo service tomcat restart
sudo service share_tomcat restart

Search Service in Use: Select Elasticsearch

Elasticsearch Hostname: Enter Elasticsearch domain endpoint after removing https as shown in screenshot above

Port: 443 is used for HTTPS connections

Secure Communications: Select https

Click on Save and restart ACS service to implement the changes

Indexing in Elasticsearch

Run Curl command from ACS instance to verify if alfresco named index was created in Elasticsearch

curl https://vpc-env-acs-large82-nodes3-3chcmyl2bamxwyxyyhor4yar7i.eu-west-2.es.amazonaws.com/_cat/indices?v

Run Curl command to create index named alfresco in Elasticsearch manually with desired number of Shards

curl -XPUT 'https://vpc-env-acs-large82-nodes3-3chcmyl2bamxwyxyyhor4yar7i.eu-west-2.es.amazonaws.com:443/alfresco?pretty' -H 'Content-Type: application/json' -d'
{
  "settings" :{
    "number_of_shards":10,
        "number_of_replicas":0
  }
}'

Indexing Pre-populated data

Create an EC2 instance with Linux OS having 2 core CPU and 8GiB RAM
Attach with security group of ACS, Elasticsearch, TS, DB
Install Java11 using command, command may change based on OS version

sudo amazon-linux-extras install java-openjdk11

Copy the JAR file from Nexus repo: alfresco-elasticsearch-connector-distribution-3.1.0-A2 and browse to the folder where alfresco-elasticsearch-reindexing-3.1.0-A2-app.jar is present
Run following commands to start indexing of un-indexed data in a newly deployed environment

nohup java -Xmx6G -jar alfresco-elasticsearch-reindexing-3.1.0-A2-app.jar \
--alfresco.reindex.jobName=reindexByIds \
--spring.elasticsearch.rest.uris=https://vpc-env-acs-large82-nodes3-3chcmyl2bamxwyxyyhor4yar7i.eu-west-2.es.amazonaws.com:443 \
--spring.datasource.url=jdbc:postgresql://env-acs-large82-cluster.cluster-cd9ifkuhgqhi.eu-west-2.rds.amazonaws.com:5432/alfresco \
--spring.datasource.username=alfresco \
--spring.datasource.password=admin2019 \
--alfresco.accepted-content-media-types-cache.enabled=false \
--spring.activemq.broker-url=failover:\(ssl://b-044009cc-074a-4f4d-9223-a50260e5ad30-1.mq.eu-west-2.amazonaws.com:61617,ssl://b-044009cc-074a-4f4d-9223-a50260e5ad30-2.mq.eu-west-2.amazonaws.com:61617\) \
--spring.activemq.user=alfresco \
--spring.activemq.password='!Alfresco2019' \
--alfresco.reindex.fromId=0 \
--alfresco.reindex.toId=80000000 \
--alfresco.reindex.multithreadedStepEnabled=true \
--alfresco.reindex.concurrentProcessors=10 \
--alfresco.reindex.metadataIndexingEnabled=true \
--alfresco.reindex.contentIndexingEnabled=false \
--alfresco.reindex.pathIndexingEnabled=true \
--alfresco.reindex.pagesize=100 \
--alfresco.reindex.batchSize=100  &

Scaling Up With Elasticsearch

Once we have ES configured with ACS, there is a need to create index that will be referred for performing all search operations. It also needs to have shards that is going to divide the indexed data in small chunks of 25 GB to 50GB which is mandatory to make search operation fast. AWS Elasticsearch recommends the size of each shard to be in range of 25GB to 50GB. Once, these shards are created and indexed data is stored, shards cannot be altered with and thus there is a need to plan its count and size based on data volume. For example, size of 1 Billion files (with metadata and Path indexed) is ~1.3TB and keeping size of each shard at 40 GB, the count of shard comes to 32. If we want to have scope for scaling it for additional 500 million, total size would be around 2TB and to keep size of each shard at 40 GB, we need to have 50 shards.

Optimizing Elasticsearch For Optimum Performance And Cost

AWS recommends having at least 1 replica shard for each Primary shard. Replica shards duplicates the content of primary shards and is better for providing resilience and to cater to very high traffic, i.e. to perform the search operation whenever primary shards become overwhelmed with search requests. But, creating 1 or 2 or 3 replica shards for each Primary will need 2X or 3X, or 4X disk size respectively. With up to 1 Billion files of data volume performance testing, replica shards have been seen to have very less impact on search query performance and are not worth spending. However, replica shards are helpful in providing resilience to Elasticsearch.

AWS Elasticsearch offers high user load scalability i.e. to enable our application to support user load of let us say 100 which was initially 50, we have two approaches. The expensive one is to increase the number of Data Nodes that host the shards, this approach will increase the cost due to extra nodes and has not been seen to have satisfactory performance improvements. The second approach is to increase the IOPS capability of the data nodes by selecting EBS: General Purpose (SSD) - gp3 instead of General Purpose (SSD) - gp2. Increasing IOPS capability of data nodes has been seen to have better performance results than adding data nodes to much extent. It is also believed to be up to 20% cheaper per GB than using GP2 which does not support scalable IOPS at fixed EBS volume.

Performance Testing AWS Elasticsearch(Version 7.10) with ACS 7.2.0, ES Connector 3.1.0

Performance Testing AWS Elasticsearch(Version 7.10) with ACS 7.2.0, ES Connector 3.1.0

AWS Elasticsearch Setup With ACS

Elasticsearch 7.10 deployment using AWS Elasticsearch service

Certificate generation for Elasticsearch

Changes in Alfresco Global Properties

ACS Admin Console Update

Indexing in Elasticsearch

Indexing Pre-populated data

Scaling Up With Elasticsearch

Optimizing Elasticsearch For Optimum Performance And Cost

We use cookies on this site to enhance your user experience