Since RHEL6.0, pacemaker is shipped as a technology preview (TP).
As explained in a blog post[1] from The Cluster Guy, you could choose between different setup for membership and quorum data, actually, three options.
But since RHEL 6.4 and Pacemaker 1.1.8, in an attempt to move towards what is best supported by Red-Hat (CMAN), you should consider dropping option 1 (aka “the plugin”) and move to CMAN. It is not mandatory yet, but it will be soon enough, probably starting with 6.5. As a reminder, “the plugin” was configured like this:
$ cat /etc/corosync/service.d/pcmk
service {
# Load the Pacemaker Cluster Resource Manager
name: pacemaker
ver: 1
}
Scared to move to CMAN ? Probably a little, but remember a tech preview cannot be considered stable and you cannot expect the product to stay consistent. Hopefully, we have been warned about this in Red-Hat’s release notes for 6.4, see [1]. What we have not been informed about is the loss of crmsh (aka the crm shell), replaced with pcs. About that, if you don’t want to migrate from crmsh to pcs, you’ll have to install crmsh from opensuse.org repository[3].
Anyway, let’s see how to migrate a production cluster without incident.
This how-to is just a mix of personal experience (also, thanks to Akee), “Quickstart Red-Hat” [4] guide from clusterlabs.org and up-to-date “Cluster from Scratch” [5].
# Do not manage anything anymore. This is persistent across reboot.
crm configure property maintenance-mode=true
# Shutdown the stack.
service pacemaker stop && service corosync stop
# Remove corosync from runlevels, CMAN will start corosync
chkconfig corosync off
# Install CMAN
yum install cman ccs
# Specify the cluster can start without quorum
sed -i.sed "s/.*CMAN_QUORUM_TIMEOUT=.*/CMAN_QUORUM_TIMEOUT=0/g" /etc/sysconfig/cman
# Get rid of the old "plugin"
rm /etc/corosync/service.d/pcmk
# Prepare your host file for rings definitions
vim /etc/hosts
> 192.168.1.1 node01.example.com
> 192.168.100.1 node01_alt.example.com
> 192.168.2.1 node02.example.com
> 192.168.200.1 node02_alt.example.com
Okay, so now, we have set-up the environment, we must define the ring(s) and the nodes. We also must configure CMAN to delegate fencing to pacemaker.
# Define the cluster
ccs -f /etc/cluster/cluster.conf --createcluster pacemaker1
# Create redundant rings
ccs -f /etc/cluster/cluster.conf --addnode node01.example.com
ccs -f /etc/cluster/cluster.conf --addalt node01.example.com node01_alt.example.com
ccs -f /etc/cluster/cluster.conf --addnode node02.example.com
ccs -f /etc/cluster/cluster.conf --addalt node02.example.com node02_alt.example.com
# Delegate fencing
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect node01.example.com
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect node02.example.com
ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk node01.example.com pcmk-redirect port=node01.example.com
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk node02.example.com pcmk-redirect port=node02.example.com
# Encrypt rings and define port (important to stick with default port if SELinux=enforcing)
ccs -f /etc/cluster/cluster.conf --setcman keyfile="/etc/corosync/authkey" transport="udpu" port="5405"
# Finally, choose your favorite rrp_mode
ccs -f /etc/cluster/cluster.conf --settotem rrp_mode="active"
Now we must validate CMAN’s configuration and propagate it to other nodes: only done once in the entire life of the cluster as the resource-level configuration is obviously still maintained across all nodes in pacemaker’s CIB.
ccs_config_validate -f /etc/cluster/cluster.conf
scp /etc/cluster/cluster.conf node02.example.com:/etc/cluster/cluster.conf
# If you were not using corosync's secauth, then
scp /etc/corosync/authkey node02.example.com:/etc/corosync/authkey
# Add CMAN to the runlevels
chkconfig cman on
# Start CMAN
service cman start
# Check the rings
corosync-objctl | fgrep members
# Check secauth, rrp_mode, transport etc.
corosync-objctl | egrep ^totem
# Start pacemaker
service pacemaker start
- Finally, on a single node:
# Validate everything
crm_mon -Arf1
# Exit maintenance-mode
crm configure property maintenance-mode=false
At this point, everything should be back to normal and you won’t have to worry about anything else than resource management anymore ;)
[1] – http://blog.clusterlabs.org/blog/2012/pacemaker-and-cluster-filesystems/
[2] – https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/6.4_Technical_Notes/pacemaker.html
[3] – http://download.opensuse.org/repositories/network:/ha-clustering/CentOS_CentOS-6/network:ha-clustering.repo
[4] – http://clusterlabs.org/quickstart-redhat.html
[5] – http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_adding_cman_support.html