CentOS/RHEL 6.4, Squid 3.1.10, IPV6 and TCP_MISS/503 errors

Since Squid 3.1.0, IPv6 support is “native” [1] and the following behavior occurs: The most active [IPv6 operation] will be DNS as IPv6 addresses are looked up for each website.

Unfortunately, until Squid 3.1.16 which introduce the configuration parameter dns_v4_first [2], you cannot change Squid order for DNS queries and IPv6 AAAA queries will always occurs first. (Well, you could always recompile with --disable-ipv6 but I cannot afford to recompile anything in my environment.)

This is where it hurts: some NS out there on the Internet are very badly configured and timeout on any AAAA queries instead of just answering NXDOMAIN or NOERROR with an empty AAAA entry.
It means that when Squid queries (directly or via a resolver) for an AAAA entry to these bugged NS, it results in a timeout (by default it retries for 15s which is 3*dns_retransmit_interval) and the request fails with a TCP_MISS/503 code.

To summarize, EL 6.4 provides Squid 3.1.10 which is stuck between 3.1.0 (native IPv6) and 3.1.16 (which allow you to try IPv4 A queries first) and won’t work with broken NS that doesn’t handle AAAA queries correctly…

ps: I’m interested in any insights about this and/or why it doesn’t fallback to IPv4 after the three retry.

[1] – http://wiki.squid-cache.org/Features/IPv6#IPv6_in_Squid
[2] – http://www.squid-cache.org/Versions/v3/3.1/cfgman/dns_v4_first.html

RHEL 6.4: Pacemaker 1.1.8, adding CMAN Support (and getting rid of “the plugin”)

Since RHEL6.0, pacemaker is shipped as a technology preview (TP).
As explained in a blog post[1] from The Cluster Guy, you could choose between different setup for membership and quorum data, actually, three options.

But since RHEL 6.4 and Pacemaker 1.1.8, in an attempt to move towards what is best supported by Red-Hat (CMAN), you should consider dropping option 1 (aka “the plugin”) and move to CMAN. It is not mandatory yet, but it will be soon enough, probably starting with 6.5. As a reminder, “the plugin” was configured like this:

$ cat /etc/corosync/service.d/pcmk
service {
        # Load the Pacemaker Cluster Resource Manager
        name: pacemaker
        ver:  1
}

Scared to move to CMAN ? Probably a little, but remember a tech preview cannot be considered stable and you cannot expect the product to stay consistent. Hopefully, we have been warned about this in Red-Hat’s release notes for 6.4, see [1]. What we have not been informed about is the loss of crmsh (aka the crm shell), replaced with pcs. About that, if you don’t want to migrate from crmsh to pcs, you’ll have to install crmsh from opensuse.org repository[3].

Anyway, let’s see how to migrate a production cluster without incident.
This how-to is just a mix of personal experience (also, thanks to Akee), “Quickstart Red-Hat” [4] guide from clusterlabs.org and up-to-date “Cluster from Scratch” [5].

  • On a single node:
# Do not manage anything anymore. This is persistent across reboot.
crm configure property maintenance-mode=true
  • On all nodes
# Shutdown the stack.
service pacemaker stop && service corosync stop

# Remove corosync from runlevels, CMAN will start corosync
chkconfig corosync off

# Install CMAN
yum install cman ccs

# Specify the cluster can start without quorum
sed -i.sed "s/.*CMAN_QUORUM_TIMEOUT=.*/CMAN_QUORUM_TIMEOUT=0/g" /etc/sysconfig/cman

# Get rid of the old "plugin"
rm /etc/corosync/service.d/pcmk

# Prepare your host file for rings definitions
vim /etc/hosts
> 192.168.1.1 node01.example.com
> 192.168.100.1 node01_alt.example.com
> 192.168.2.1 node02.example.com
> 192.168.200.1 node02_alt.example.com

Okay, so now, we have set-up the environment, we must define the ring(s) and the nodes. We also must configure CMAN to delegate fencing to pacemaker.

  • On a single node:
# Define the cluster
ccs -f /etc/cluster/cluster.conf --createcluster pacemaker1

# Create redundant rings
ccs -f /etc/cluster/cluster.conf --addnode node01.example.com
ccs -f /etc/cluster/cluster.conf --addalt node01.example.com node01_alt.example.com
ccs -f /etc/cluster/cluster.conf --addnode node02.example.com
ccs -f /etc/cluster/cluster.conf --addalt node02.example.com node02_alt.example.com

# Delegate fencing
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect node01.example.com
ccs -f /etc/cluster/cluster.conf --addmethod pcmk-redirect node02.example.com
ccs -f /etc/cluster/cluster.conf --addfencedev pcmk agent=fence_pcmk
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk node01.example.com pcmk-redirect port=node01.example.com
ccs -f /etc/cluster/cluster.conf --addfenceinst pcmk node02.example.com pcmk-redirect port=node02.example.com

# Encrypt rings and define port (important to stick with default port if SELinux=enforcing)
ccs -f /etc/cluster/cluster.conf --setcman keyfile="/etc/corosync/authkey" transport="udpu" port="5405"
# Finally, choose your favorite rrp_mode
ccs -f /etc/cluster/cluster.conf --settotem rrp_mode="active" 

Now we must validate CMAN’s configuration and propagate it to other nodes: only done once in the entire life of the cluster as the resource-level configuration is obviously still maintained across all nodes in pacemaker’s CIB.

ccs_config_validate -f /etc/cluster/cluster.conf
scp /etc/cluster/cluster.conf node02.example.com:/etc/cluster/cluster.conf
# If you were not using corosync's secauth, then
scp /etc/corosync/authkey node02.example.com:/etc/corosync/authkey
  • On all nodes:
# Add CMAN to the runlevels
chkconfig cman on

# Start CMAN
service cman start

# Check the rings
corosync-objctl | fgrep members
# Check secauth, rrp_mode, transport etc.
corosync-objctl | egrep ^totem

# Start pacemaker
service pacemaker start
  • Finally, on a single node:
# Validate everything
crm_mon -Arf1

# Exit maintenance-mode
crm configure property maintenance-mode=false

At this point, everything should be back to normal and you won’t have to worry about anything else than resource management anymore ;)

[1] – http://blog.clusterlabs.org/blog/2012/pacemaker-and-cluster-filesystems/
[2] – https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/6.4_Technical_Notes/pacemaker.html
[3] – http://download.opensuse.org/repositories/network:/ha-clustering/CentOS_CentOS-6/network:ha-clustering.repo
[4] – http://clusterlabs.org/quickstart-redhat.html
[5] – http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/_adding_cman_support.html

Simple HTTP file sharing in bash

Just a little bash function inspired by a post[1], that allows to quickly share file(s) over the network using a one-shot HTTP request. It only requires nc to bind the listening socket.

function share() {
    ! type -a nc &>/dev/null && { echo "This machine doesn't have 'nc'" >/dev/stderr ; return 1 ; }
    [[ $# -ne 1 ]] && { echo "Usage: share <filename>" >/dev/stderr ; return 1 ; }
    [[ ! -f $1 ]] && { echo "No such file : $1" >/dev/stderr ; return 1 ; }
    port=$((8000 + RANDOM%1000))
    if [[ $OSTYPE == "linux-gnu" ]] ; then
        while read ip ; do echo "http://$ip:$port/$1" ; done < <(/sbin/ip addr ls | awk '/inet / {split($2,a,"/");print a[1];}' | sort -n | uniq)
        cat - "$1" <<- EOF | $(type -p nc) -l -p $port -q 0
            HTTP/1.1 200 OK\r
            Content-Length: $(stat -c %s "$1")
            Content-Type: $(file -b --mime-type "$1")
EOF
    # Check André's comment
    elif [[ $OSTYPE == "darwin*" ]] ; then
        ifconfig -u | \
            sed -ne 's/.*inet \([0-9.]*\) netmask .*/\1/p' | \
            sort -nu | while read ip ; do echo "http://$ip:$port/$1" ; done
    cat - "$1" < < EOF | nc -l "$port"
        HTTP/1.1 200 Ok
        Content-Type: application/octet-stream
        Content-Length: $(eval $(stat -s "$1") && echo $st_size)
EOF
    else
        echo "Please provide a patch for other flavor of Linux/*BSD: <contact@floriancrouzat.net>" && return 1
    fi
}

[1] – http://www.vidarholen.net/contents/blog/?p=17