Elasticsearch Deployment on AWS EC2 Instance

cancel
Showing results for 
Search instead for 
Did you mean: 

Elasticsearch Deployment on AWS EC2 Instance

amitsingh
Alfresco Employee
0 0 2,497

Deploy Master & Data Node on AWS-EC2 instances

We have used three instances, one as Master eligible node that can work as both Master and Data node; other two will work as data node only. One can choose to use any number of EC2 instances, here we have used three. The instances that we choose have the following configurations:

AMI ID : ami-08153220276a5d89b, this AMI has RHEL 8

EC2 Instance type : r5.xlarge

Attached EBS : 1500 GiB (Note: EBS Volume will depend on the volume of data to be indexed. For 80mln use 1.5TB, for 500mln use 10TB and 1000mln use 20TB)

Security Group : Use SG same as ACS, or use another SG with open ports to connect with ACS

Private IPs of three launched instances [“10.0.2.85", "10.0.2.81","10.0.2.68"]

Install & Configure Elasticsearch on Instances 

Now out of three EC2 instances we are making one EC2 instance act as both Master and Data node and other two EC2 instances as Data node.

1st EC2 Instance which acts As Both Master And Data Node

Let’s assume its Private IP = 10.0.2.68

Install Elasticsearch by using the following command

sudo rpm -i https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.5.1-x86_64.rpm

This will rerun all generators , reload all unit files, and recreate the entire dependency tree.

sudo systemctl daemon-reload

Use the following command to enable elasticsearch service

sudo systemctl enable elasticsearch.service

Open the elasticsearch.yml file by using the following command

sudo vi /etc/elasticsearch/elasticsearch.yml

Add/modify the following configurations in elasticsearch.yml file

cluster.name: my-application
node.name: node-1
network.host: 10.0.2.68
http.port: 9200
discovery.seed_hosts: ["10.0.2.85", "10.0.2.81","10.0.2.68"]
cluster.initial_master_nodes: ["10.0.2.68"]
node.master: true

Use the given below command to start the elasticsearch

sudo systemctl start elasticsearch.service

Use below command to check the Elasticsearch status

sudo systemctl status elasticsearch.service

Expected Response:

elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2022-12-06 12:50:25 UTC; 22h ago
     Docs: http://www.elastic.co
 Main PID: 5528 (java)
    Tasks: 84 (limit: 201139)
   Memory: 28.1G
   CGroup: /system.slice/elasticsearch.service
           ├─5528 /usr/share/elasticsearch/jdk/bin/java -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF>
           └─5622 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller

Dec 06 12:50:14 ip-10-0-2-68.eu-west-2.compute.internal systemd[1]: Starting Elasticsearch...
Dec 06 12:50:15 ip-10-0-2-68.eu-west-2.compute.internal elasticsearch[5528]: OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a fut>
Dec 06 12:50:25 ip-10-0-2-68.eu-west-2.compute.internal systemd[1]: Started Elasticsearch.
2nd EC2 Instance which acts As Data Node

Let’s assume its Private IP = 10.0.2.81

Install Elasticsearch by using the following command

sudo rpm -i https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.5.1-x86_64.rpm

This will rerun all generators , reload all unit files, and recreate the entire dependency tree.

sudo systemctl daemon-reload

Use the following command to enable elasticsearch service

sudo systemctl enable elasticsearch.service

Open the elasticsearch.yml file by using the following command

sudo vi /etc/elasticsearch/elasticsearch.yml

Add/modify the following configurations in elasticsearch.yml file

cluster.name: my-application
node.name: node-2
network.host: 10.0.2.81
http.port: 9200
discovery.seed_hosts: ["10.0.2.85", "10.0.2.81","10.0.2.68"]
cluster.initial_master_nodes: ["10.0.2.68"]

Use the given below command to start the elasticsearch

sudo systemctl start elasticsearch.service

Use below command to check the Elasticsearch status

sudo systemctl status elasticsearch.service

Expected Response:

elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2022-12-06 12:50:25 UTC; 22h ago
     Docs: http://www.elastic.co
 Main PID: 5528 (java)
    Tasks: 84 (limit: 201139)
   Memory: 28.1G
   CGroup: /system.slice/elasticsearch.service
           ├─5528 /usr/share/elasticsearch/jdk/bin/java -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF>
           └─5622 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller

Dec 06 12:50:14 ip-10-0-2-68.eu-west-2.compute.internal systemd[1]: Starting Elasticsearch...
Dec 06 12:50:15 ip-10-0-2-68.eu-west-2.compute.internal elasticsearch[5528]: OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a fut>
Dec 06 12:50:25 ip-10-0-2-68.eu-west-2.compute.internal systemd[1]: Started Elasticsearch.
3rd EC2 Instance which acts as Data Node

Let’s assume its Private IP = 10.0.2.85

Install Elasticsearch by using the following command

sudo rpm -i https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.5.1-x86_64.rpm

This will rerun all generators , reload all unit files, and recreate the entire dependency tree.

sudo systemctl daemon-reload

Use the following command to enable elasticsearch service

sudo systemctl enable elasticsearch.service

Open the elasticsearch.yml file by using the following command

sudo vi /etc/elasticsearch/elasticsearch.yml

Add/modify the following configurations in elasticsearch.yml file

cluster.name: my-application
node.name: node-3
network.host: 10.0.2.85
http.port: 9200
discovery.seed_hosts: ["10.0.2.85", "10.0.2.81","10.0.2.68"]
cluster.initial_master_nodes: ["10.0.2.68"]

Use the given below command to start the elasticsearch

sudo systemctl start elasticsearch.service

Use below command to check the Elasticsearch status

sudo systemctl status elasticsearch.service

Expected Response:

elasticsearch.service - Elasticsearch
   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2022-12-06 12:50:25 UTC; 22h ago
     Docs: http://www.elastic.co
 Main PID: 5528 (java)
    Tasks: 84 (limit: 201139)
   Memory: 28.1G
   CGroup: /system.slice/elasticsearch.service
           ├─5528 /usr/share/elasticsearch/jdk/bin/java -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF>
           └─5622 /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/controller

Dec 06 12:50:14 ip-10-0-2-68.eu-west-2.compute.internal systemd[1]: Starting Elasticsearch...
Dec 06 12:50:15 ip-10-0-2-68.eu-west-2.compute.internal elasticsearch[5528]: OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a fut>
Dec 06 12:50:25 ip-10-0-2-68.eu-west-2.compute.internal systemd[1]: Started Elasticsearch.

Now Finally we are done with making a cluster with 3 data nodes in which one node is both Data and Master Node and two other nodes are Data Node.

Create desired number of Primary and Replica Shards using below Curl command

curl -XPUT 'http://10.0.2.68:9200/alfresco?pretty' -H 'Content-Type: application/json' -d'
{
  "settings" :{
    "number_of_shards":24,
        "number_of_replicas":0
  }
}'

Now hit curl command from bastion to check Elasticsearch Cluster Details

curl -X GET http://10.0.2.68:9200/_cluster/health?pretty

Expected Response:

{
  "cluster_name" : "my-application",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 24,
  "active_shards" : 24,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Once we are done with above steps, we have our application ready for indexing the metadata, content, path of files in the repository or to be uploaded which will be later used for performing search results