RHEL 6.4: Pacemaker 1.1.8, adding CMAN Support (and getting rid of “the plugin”)

Since RHEL6.0, pacemaker is shipped as a technology preview (TP).
As explained in a blog post[1] from The Cluster Guy, you could choose between different setup for membership and quorum data, actually, three options.

But since RHEL 6.4 and Pacemaker 1.1.8, in an attempt to move towards what is best supported by Red-Hat (CMAN), you should consider dropping option 1 (aka “the plugin”) and move to CMAN. It is not mandatory yet, but it will be soon enough, probably starting with 6.5. As a reminder, “the plugin” was configured like this:

$ cat /etc/corosync/service.d/pcmk
service {
        # Load the Pacemaker Cluster Resource Manager
        name: pacemaker
        ver:  1
}

Scared to move to CMAN ? Probably a little, but remember a tech preview cannot be considered stable and you cannot expect the product to stay consistent. Hopefully, we have been warned about this in Red-Hat’s release notes for 6.4, see [1]. What we have not been informed about is the loss of crmsh (aka the crm shell), replaced with pcs. About that, if you don’t want to migrate from crmsh to pcs, you’ll have to install crmsh from opensuse.org repository[3].

Anyway, let’s see how to migrate a production cluster without incident.
This how-to is just a mix of personal experience (also, thanks to Akee), “Quickstart Red-Hat” [4] guide from clusterlabs.org and up-to-date “Cluster from Scratch” [5].

  • On a single node:
# Do not manage anything anymore. This is persistent across reboot.
crm configure property maintenance-mode=true
  • On all nodes
# Shutdown the stack.
service pacemaker stop && service corosync stop

# Remove corosync from runlevels, CMAN will start corosync
chkconfig corosync off

# Install CMAN
yum install cman ccs

# Specify the cluster can start without quorum
sed -i.sed "s/.*CMAN_QUORUM_TIMEOUT=.*/CMAN_QUORUM_TIMEOUT=0/g" /etc/sysconfig/cman

# Get rid of the old "plugin"
rm /etc/corosync/service.d/pcmk

# Prepare your host file for rings definitions
vim /etc/hosts
> 192.168.1.1 node01.example.com
> 192.168.100.1 node01_alt.example.com
> 192.168.2.1 node02.example.com
> 192.168.200.1 node02_alt.example.com

Okay, so now, we have set-up the environment, we must define the ring(s) and the nodes. We also must configure CMAN to delegate fencing to pacemaker.

  • On a single node:
# Define the cluster
ccs -f /etc/cluster/cluster.conf --createcluster pacemaker1

# Create redundant rings
ccs -f /etc/cluster/cluster.conf --addnode node01.example.com
ccs -f /etc/cluster/cluster.conf --addalt node01.example.com node01_alt.example.com
ccs -f /etc/cluster/cluster.conf --addnode node02.example.com
ccs -f /etc/cluster/cluster.conf --addalt node02.example.com node02_alt.example.com

# Delegate fencing
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect node01.example.com
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect node02.example.com
ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk node01.example.com pcmk-redirect port=node01.example.com
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk node02.example.com pcmk-redirect port=node02.example.com

# Encrypt rings and define port (important to stick with default port if SELinux=enforcing)
ccs -f /etc/cluster/cluster.conf --setcman keyfile="/etc/corosync/authkey" transport="udpu" port="5405"
# Finally, choose your favorite rrp_mode
ccs -f /etc/cluster/cluster.conf --settotem rrp_mode="active" 

Now we must validate CMAN’s configuration and propagate it to other nodes: only done once in the entire life of the cluster as the resource-level configuration is obviously still maintained across all nodes in pacemaker’s CIB.

ccs_config_validate -f /etc/cluster/cluster.conf
scp /etc/cluster/cluster.conf node02.example.com:/etc/cluster/cluster.conf
# If you were not using corosync's secauth, then
scp /etc/corosync/authkey node02.example.com:/etc/corosync/authkey
  • On all nodes:
# Add CMAN to the runlevels
chkconfig cman on

# Start CMAN
service cman start

# Check the rings
corosync-objctl | fgrep members
# Check secauth, rrp_mode, transport etc.
corosync-objctl | egrep ^totem

# Start pacemaker
service pacemaker start
  • Finally, on a single node:
# Validate everything
crm_mon -Arf1

# Exit maintenance-mode
crm configure property maintenance-mode=false

At this point, everything should be back to normal and you won’t have to worry about anything else than resource management anymore ;)

[1] – http://blog.clusterlabs.org/blog/2012/pacemaker-and-cluster-filesystems/
[2] – https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/6.4_Technical_Notes/pacemaker.html
[3] – http://download.opensuse.org/repositories/network:/ha-clustering/CentOS_CentOS-6/network:ha-clustering.repo
[4] – http://clusterlabs.org/quickstart-redhat.html
[5] – http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_adding_cman_support.html

4 comments so far.

  1. Excellent, But please define more about pacemaker and how it will be more useful in production cluster configuration and deployment.

    TY/SA

  2. The arguments for adding cman to corosync and pacemaker are not clear. Keeping a configuration simple and clean makes the installation easy to install, easy to configure, and easy to manage. Add cman makes the corosysnc/pacemaker installation and configuration more complicated to install, configure, and manage, and cman’s benefits are not clearly stated. Many Linux HA installations on the internet simply use corosync and pacemaker, and work fine. A clear argument for the need for cman has not been demonstrated, except that Redhat says to do it, and that is not sufficient. Thanks, Jim

    • Hey Jim, well the main idea here I believe was to move from technical preview in RHEL6.4 to supported product with RHEL6.5 and in order to do so they had to remove the preview part aka the corosync-1.x plugin and rollback to CMAN has it’s been tested for years in Red-Hat Cluster Suite (RHCS). But, the final solution starting with RHEL7.0 is to fully use pacemaker + corosync 2. In the end, this choice has only been politic ;)

  3. I have found pacemaker/corosync 2 very stable on RHEL 7. On the other hand I tested pacemaker/corosync 1.4/cman on RHEL 6.6 and they weren’t as stable. It got one split-brained incidence already. Does anyone know if there is a way to port/install the pacemaker/corosync 2 stack on RHEL 6?

    Huy

Share your thoughts

*