How to Configure NGINX High Availability Cluster Using Pacemaker on Ubuntu 16.04
Learn how to set up and configure Pacemaker to create a high availabilty cluster with failover capabilities on multiple nodes.
Join the DZone community and get the full member experience.
Join For FreeIntroduction
High availability is a term that describes a website or applications that are durable and likely to operate continuously without failure for a long time. High availability provides a number of fail-safes, and aims for a 99% uptime. Highly available systems are made from several components and they can be scaled horizontally when needed, thus improving their ability to serve content.
Pacemaker is an advanced, scalable high-availability cluster resource manager that provides maximum availability of the cluster resources by doing failover of resources between the cluster nodes. Pacemaker uses Corosync for heartbeat and internal communication among cluster components. Pacemaker manages all cluster resources and achieves maximum availability by detecting and recovering from node- and resource-level failures by making use of the messaging and membership capabilities provided by Corosync.
In this tutorial, we will explain the installation and configuration of two Node NGINX Web Server Clustering using Pacemaker on Ubuntu 16.04 server.
Requirements
- Two fresh Alibaba Cloud instance with Ubuntu 16.04 server installed.
- A static IP address. 192.168.0.102 is configured on the first instance and 192.168.0.103 is configured on the second instance. We will use floating IP Address 192.168.0.104.
- A Root password set up on both instances.
Launch Alibaba Cloud ECS Instance
First, login to your Alibaba Cloud ECS Console and create a new ECS instance , choosing Ubuntu 16.04 as the operating system with at least 2GB RAM. Connect to your ECS instance and log in as the root user.
Once you are logged into your Ubuntu 16.04 instance, run the following command to update your base system with the latest available packages.
apt-get update -y
Getting Started
Before starting, you will need to configure hosts file on each server, so each server can communicate to the other servers with the hostname of the server.
You can do this by editing /etc/hosts file on both servers.
nano /etc/hosts
Add the following lines:
192.168.0.102 node1
192.168.0.103 node2
Save and close the file, when you are finished. Next, test hostname resolution by pinging the other server using hostname:
ping node1
ping node2
Install and Configure NGINX
Before setting up the High Availability web server, you will need to install and configure NGINX on each of the nodes. You can install NGINX by running the following command:
apt-get install nginx -y
Once NGINX is installed, start the NGINX service and enable it to start on boot time by running the following command on each of the nodes:
systemctl start nginx
systemctl enable nginx
Next, create default index.html page of NGINX on each node:
On Node1, open the index.html page:
nano /var/www/html/index.html
Remove all the lines and add the following lines:
<h1>
Nginx Cluster ::: Node1
</h1>
Save and close the file when you are finished.
On Node2, open the index.html page:
nano /var/www/html/index.html
Remove all the lines and add the following lines:
<h1>
Nginx Cluster ::: Node2
</h1>
Save and close the file when you are finished.
Now, stop the NGINX service on each node:
systemctl stop nginx
Install Pacemaker, Corosync, and Crmsh
Next, you will need to install Pacemaker, Corosync, and Crmsh on each node. By default, all the packages are available in Ubuntu 16.04 default repository. So you can install all of them with the following command:
apt-get install pacemaker corosync crmsh -y
Once the installation is completed, stop Pacemaker and Corosync services with the following command:
systemctl stop corosync
systemctl stop pacemaker
Configure Corosync
Next, you will need to configure Corosync on Node1 and generate the Corosync key for the cluster authentication.
Before starting, you will need to install haveged to generate random numbers for the Corosync key. You can install it with the following command:
apt-get install haveged -y
Next, generate Corosync key by running the following command:
corosync-keygen
You should see the following output:
Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Press keys on your keyboard to generate entropy.
Press keys on your keyboard to generate entropy (bits = 920).
Writing corosync key to /etc/corosync/authkey.
You can also see the generated key using the following command:
ls -l /etc/corosync/
Output:
-r-------- 1 root root 128 Feb 28 20:39 authkey
-rw-r--r-- 1 root root 3929 Oct 21 2015 corosync.conf
Next, change the directory to /etc/corosync and remove default configuration file:
cd /etc/corosync/
rm -rf corosync.conf
Next, create a new corosync.conf file as shown below:
nano corosync.conf
Add the following lines:
totem {
version: 2
cluster_name: lbcluster
transport: udpu
interface {
ringnumber: 0
bindnetaddr: 192.168.0.102
broadcast: yes
mcastport: 5405
}
}
quorum {
provider: corosync_votequorum
two_node: 1
}
nodelist {
node {
ring0_addr: 192.168.0.102
name: primary
nodeid: 1
}
node {
ring0_addr: 192.168.0.103
name: secondary
nodeid: 2
}
}
logging {
to_logfile: yes
logfile: /var/log/corosync/corosync.log
to_syslog: yes
timestamp: on
}
service {
name: pacemaker
ver: 1
}
Save and close the file when you are finished.
Next, copy the Corosync authentication key and the configuration file from Node1 to Node2 with the following command:
scp /etc/corosync/* root@192.168.0.103:/etc/corosync/
Start Cluster Service
Now, start Pacemaker and Corosync service on each of the nodes and enable them to start on boot time with the following command:
systemctl start corosync
systemctl enable corosync
systemctl start pacemaker
systemctl enable pacemaker
Once both services have been started, check the status of the service on both nodes with the following command:
crm status
If everything is fine, you should see the following output:
Last updated: Wed Feb 28 21:13:27 2018Last change: Wed Feb 28 21:12:44 2018 by hacluster via crmd on primary
Stack: corosync
Current DC: primary (version 1.1.14-70404b0) - partition with quorum
2 nodes and 0 resources configured
Online: [ primary secondary ]
Full list of resources:
You can also check the Corosync members with the following command:
corosync-cmapctl | grep members
You should see the IP address of both nodes in the following output:
runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.0.102)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.0.103)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
Configure Cluster
Now, we are ready to create and configure Pacemaker. Here, we will run all Pacemaker commands on Primary Node (Node1), as it automatically synchronizes all cluster-related changes across all member nodes.
Next, you will also need to disable STONITH mode. STONITH is a mode that can be used to remove faulty nodes. Here, we are setting up a two node cluster, so we don't need STONITH mode.
You can disable it with the following command:
crm configure property stonith-enabled=false
crm configure property no-quorum-policy=ignore
Now, verify your STONITH status and the quorum policy with the following command:
crm configure show
You should see the following output:
node 1: primary
node 2: secondary
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.14-70404b0 \
cluster-infrastructure=corosync \
cluster-name=debian \
stonith-enabled=false \
no-quorum-policy=ignore
Pacemaker is now running and configured. Next, you will need to create some new resources for the cluster, a Virtual IP for the floating IP and a web server for NGINX service.
You can create a new Virtual IP resource for the floating IP using the CRM command as shown below:
crm configure primitive virtual_ip ocf:heartbeat:IPaddr2 params ip="192.168.0.104" cidr_netmask="32" op monitor interval="10s" meta migration-threshold="10"
Next, create a web server resource using the following command:
crm configure primitive webserver ocf:heartbeat:nginx configfile=/etc/nginx/nginx.conf op start timeout="40s" interval="0" op stop timeout="60s" interval="0" op monitor interval="10s" timeout="60s" meta migration-threshold="10"
Next, check the status of the new resource with the following command:
crm resource status
You should see the following output:
virtual_ip(ocf::heartbeat:IPaddr2):Started
webserver(ocf::heartbeat:nginx):Started
Next, you will also need to add a group for the new configuration of the Failover IP service. Now, add the virtual_ip
and web server resources to a new group named hakase_balancing
by running the following command:
crm configure group hakase_balancing virtual_ip webserver
Next, check the status of the new resource with the following command:
crm resource show
You should see the following output:
Resource Group: hakase_balancing
virtual_ip(ocf::heartbeat:IPaddr2):Started
webserver(ocf::heartbeat:nginx):Started
Test High Availability
The cluster configuration is now completed, and it's time to check the status of node and cluster.
You can do this with the following command:
crm status
You should see the following output:
Last updated: Wed Feb 28 21:35:21 2018Last change: Wed Feb 28 21:34:50 2018 by root via cibadmin on primary
Stack: corosync
Current DC: primary (version 1.1.14-70404b0) - partition with quorum
2 nodes and 2 resources configured
Online: [ primary secondary ]
Full list of resources:
Resource Group: hakase_balancing
virtual_ip(ocf::heartbeat:IPaddr2):Started primary
webserver(ocf::heartbeat:nginx):Started primary
You have now two nodes [primary secondary] with status online.
Now, from the remote machine, open your web browser and type the URL http://192.168.0.104 (Floating IP). You should see the Node1 page:
Next, stop the cluster service on Node1 with the following command:
crm cluster stop
Now, check the cluster status on the Node2 with the following command:
crm status
You should see that primary node is offline and secondary node is online as shown below:
Last updated: Wed Feb 28 22:00:59 2018Last change: Wed Feb 28 21:46:57 2018 by root via cibadmin on primary
Stack: corosync
Current DC: secondary (version 1.1.14-70404b0) - partition with quorum
2 nodes and 2 resources configured
Online: [ secondary ]
OFFLINE: [ primary ]
Full list of resources:
Resource Group: hakase_balancing
virtual_ip(ocf::heartbeat:IPaddr2):Started secondary
webserver(ocf::heartbeat:nginx):Started secondary
Now, from the remote machine, open your web browser and type the URL http://192.168.0.104 (Floating IP). You should see the Node2 page:
Troubleshoot Cluster
If your High Availability setup is not working as expected. You can use some useful troubleshooting command to find the exact reason.
The crm_mon is a very useful tool for viewing the real-time status of your nodes and resources:
crm_mon
You should see the following output:
Last updated: Wed Feb 28 23:46:46 2018 Last change: Wed Feb 28 22:00:43 2018 by root via cibadmin on primary
Stack: corosync
Current DC: secondary (version 1.1.14-70404b0) - partition WITHOUT quorum
2 nodes and 2 resources configured
Online: [ secondary ]
OFFLINE: [ primary ]
Resource Group: hakase_balancing
virtual_ip (ocf::heartbeat:IPaddr2): Started secondary
webserver (ocf::heartbeat:nginx): Started secondary
You can see your cluster configuration using the following command:
crm configure show
Output:
node 1: primary
node 2: secondary
primitive virtual_ip IPaddr2 \
params ip=192.168.0.104 cidr_netmask=32 \
op monitor interval=10s \
meta migration-threshold=10
primitive webserver nginx \
params configfile="/etc/nginx/nginx.conf" \
op start timeout=40s interval=0 \
op stop timeout=60s interval=0 \
op monitor interval=10s timeout=60s \
meta migration-threshold=10
group hakase_balancing virtual_ip webserver
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.14-70404b0 \
cluster-infrastructure=corosync \
cluster-name=debian \
stonith-enabled=false \
no-quorum-policy=ignore
You can also troubleshoot cluster by looking the Corosync logs using the following command:
tail -f /var/log/corosync/corosync.log
Congratulations! You now have a basic NGINX High Availability server setup using Corosync and Pacemaker on Ubuntu 16.04 server. For more information refer the official Pacemaker doc.
Opinions expressed by DZone contributors are their own.
Comments