Storage backend

The storage backend consists of two hosts. Both hosts are using DRBD in order to synchronize a block device. Between these hosts heartbeat is running serving all cluster nodes on a virtual ethernet interface. There is always one active and one passive host. If the active host fails, heartbeat automatically switches over to the secondary host and resumes serving the storage backend again.

The storage backend exports a cluster block device using GNBD. On this device, OCFS2 is used as a cluster file system. Every cluster node imports this GNBD device and mounts it. On this cluster file system all Xen virtual machines reside as single files.

Setup

The setup of the storage backend consists of the following steps:

  • Setup LVM

  • Setup DRBD

  • Setup Redhat Cluster Suite (GNBD)

  • Setup Heartbeat

LVM

On both storage nodes, a local volume has to be setup in order to be used by DRBD. The volume should be the same size on both nodes.

apt-get install clvmd
pvcreate /dev/hda3
vgcreate vg1
lvcreate -L 10G -n cluster vg1

DRBD

In order to use DRBD together with the Xen kernel, both the DRBD kernel module and the tools need to be compiled by the lastest stable release source. The available packages provided by Debian do not successfully compile against the Xen kernel. This worked flawlessly:

wget http://oss.linbit.com/drbd/0.7/drbd-0.7.19.tar.gz
tar xvzf drbd-0.7.19.tar.gz
cd drbd-0.7.19
make KDIR=/usr/src/xen-3.0.2-2/linux-2.6.16-xen
make install
make tools
make install-tools

The first step of configuring DRBD is to create the configuration file which is located under /etc/drbd.conf:

resource cluster {
protocol C;
 incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ;\
  sleep 60; halt -f"
 startup {
  degr-wfc-timeout 120;
 }
 disk {
  on-io-error detach;
 }
 net {}
 syncer {
  rate 5M;
  group 1;
 }
 on xenamo4 {
  device /dev/drbd0;
  disk /dev/vg1/cluster;
  address 10.0.0.4:7788;
  meta-disk internal;
 }
 on xenamo5 {
  device /dev/drbd0;
  disk /dev/vg1/cluster;
  address 10.0.0.5:7788;
  meta-disk internal;
 }
}

Afterwards, the DRBD service has to be started on both nodes:

drbdadm up all

The first time you start up the devices, they both start in secondary mode. On one host, you must force the device to become primary:

drbdadm -- --do-what-I-say primary all

Now, DRBD starts to synchronize the hosts. You can follow the process by executing the following command:

cat /proc/drbd

As soon as synchronizing has finished, the output of the above command looks like this:

version: 0.7.19 (api:78/proto:74)
SVN Revision: 2212 build by root@xenamo4, 2006-05-28 10:20:40
 0: cs:Connected st:Secondary/Primary ld:Consistent
    ns:0 nr:72451 dw:72451 dr:0 al:0 bm:128 lo:0 pe:0 ua:0 ap:0

Redhat cluster suite

Because of the same reason as with DRBD, we also need to compile the Redhat cluster suite from sources:

apt-get install libsysfs-dev libxml2-dev
wget ftp://sources.redhat.com/pub/cluster/releases/\
cluster-1.02.00.tar.gz
tar xvzf cluster-1.02.00.tar.gz
cd cluster-1.02.00
./configure --kernel_src=/usr/src/xen-3.0.2-2/linux-2.6.16-xen
make install

Afterwards, all needed kernel modules and tools are available. Next step is to configure the cluster. Therefore, the file /etc/cluster/cluster.conf needs to be created:

<?xml version="1.0"?>
<cluster name="xencluster" config_version="1">
<cman>
</cman>
<clusternodes>
<clusternode name="xenamo1">
        <fence>
                <method name="single">
                  <device name="human" nodename="xenamo1"/>
                </method>
        </fence>
</clusternode>
<clusternode name="xenamo2">
        <fence>
                <method name="single">
                  <device name="human" nodename="xenamo2"/>
                </method>
        </fence>
</clusternode>
<clusternode name="xenamo3">
        <fence>
                <method name="single">
                  <device name="human" nodename="xenamo3"/>
                </method>
        </fence>
</clusternode>
<clusternode name="xenamo4">
        <fence>
                <method name="single">
                  <device name="human" nodename="xenamo4"/>
                </method>
        </fence>
</clusternode>
<clusternode name="xenamo5">
        <fence>
                <method name="single">
                  <device name="human" nodename="xenamo5"/>
                </method>
        </fence>
</clusternode>
</clusternodes>
<fencedevices>
        <fencedevice name="human" agent="fence_manual"/>
</fencedevices>
</cluster>

In order to start the cluster automatically at startup, the following files need to be edited:

Add the following modules to /etc/modules:

dm-mod
gfs
lock_dlm

* Create a start/stop script for the cluster LVM daemon at /etc/init.d/clvmd:

#! /bin/sh
#
# clvmd         Start/Stop script for the cluster LVM daemon
#
# Author:       Daniel Bertolo <dbertolo@hsr.ch>.
#
# Version:      @(#)clvmd  1.00  25-Jun-2006  dbertolo@hsr.ch
#
set -e
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
DESC="The cluster LVM daemon"
NAME=clvmd
DAEMON=/sbin/$NAME
PIDFILE=/var/run/$NAME.pid
SCRIPTNAME=/etc/init.d/$NAME
# Gracefully exit if the package has been removed.
test -x $DAEMON || exit 0
#
#       Function that starts the daemon/service.
#
d_start() {
        start-stop-daemon --start --quiet --pidfile $PIDFILE \
                --exec $DAEMON
}
#
#       Function that stops the daemon/service.
#
d_stop() {
        start-stop-daemon --stop --quiet --pidfile $PIDFILE \
                --name $NAME
}
#
#       Function that sends a SIGHUP to the daemon/service.
#
d_reload() {
        start-stop-daemon --stop --quiet --pidfile $PIDFILE \
                --name $NAME --signal 1
}
case "$1" in
  start)
        echo -n "Starting $DESC: $NAME"
        d_start
        echo "."
        ;;
  stop)
        echo -n "Stopping $DESC: $NAME"
        d_stop
        echo "."
        ;;
  restart|force-reload)
        #
        #       If the "reload" option is implemented, move the "force-reload"
        #       option to the "reload" entry above. If not, "force-reload" is
        #       just the same as "restart".
        #
        echo -n "Restarting $DESC: $NAME"
        d_stop
        sleep 1
        d_start
        echo "."
        ;;
  *)
        # echo "Usage: $SCRIPTNAME {start|stop|restart|reload|force-reload}" >&2
        echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2
        exit 1
        ;;
esac
exit 0

In order to provide the cluster name to the startup script, the file /etc/sysconfig/cluster needs to be created with the following content:

CCSD_OPTS=""
CLUSTERNAME="xencluster"

As the startup scripts generated by the Redhat cluster suite do fit to Redhat based distributions, you need to manually edit the following files in order to use them. Comment out every line using one of the commands status, success or failure and the line containing . /etc/init.d/functions. Like this, no output is generated, but the scripts work.

  • /etc/init.d/ccsd

  • /etc/init.d/cman

  • /etc/init.d/fenced

The last step of configuring the cluster suite is to add appropriate symlinks to the runlevels 0, 2 and 6 in order to start and stop the cluster node on boot and shutdown. The services must be started in the following order and after DRBD:

update-rc.d ccsd defaults 71 52
update-rc.d cman defaults 72 51
update-rc.d fenced defaults 73 50
update-rc.d clvmd defaults 74 49

These links must be set in the runlevels 0 (halt) and 6 (reboot) in reverse order and before the DRBD service is stopped.

Heartbeat

The last service needed in order to run the storage backend is heartbeat:

apt-get install heartbeat

The configuration of heartbeat consists of the files described below. All these files need to be available on both hosts.

  • The file /etc/ha.d/ha.cf defines the other host heartbeat shall control. In order not to rely on the network link only, a null-modem cable was used to connect the two hosts. Like this, heartbeat knows, whetever only the network link is down or the other host has gone. The IP address in this file is the address of the other host:

keepalive 1
deadtime 10
warntime 5
initdead 60
udpport 694
baud 19200
serial /dev/ttyS0
ucast eth0 10.0.0.5
auto_failback off
watchdog /dev/watchdog
node xenamo4
node xenamo5
  • The File /etc/ha.d/haresourcesdefines a virtual network interface which shall be active on one of the hosts and which controls DRBD:

xenamo4 10.0.0.6/24 drbddisk::cluster \
IPaddr::10.0.0.6/24/eth0
  • The file /etc/ha.d/resources/drbddisk was created when DRBD was installed. But it has to be adapted in order to control the GNBD exports, too. Change the start section to look like this:

start)
    # try several times, in case heartbeat deadtime
    # was smaller than drbd ping time
    try=6
    while true; do
            $DRBDADM primary $RES && break
            let "--try" || exit 20
            sleep 1
    done
    vgchange -aly
    gnbd_serv
    gnbd_export -e ocfscluster -d /dev/drbd0
    ;;
  • This exports the DRBD device using GNBD automatically as soon as this host becomes primary. The stop section has to be adapted accordingly:

stop)
    # exec, so the exit code of drbdadm propagates
    exec $DRBDADM secondary $RES
    gnbd_export -R
    gnbd_serv -k
    vgchange -aln
    ;;
  • The last file that needs to be adapted is /etc/ha.d/authkeys, which defines a password on both nodes, in order to secure the connection:

auth 2
1 crc
2 sha1 <your-password>
3 md5 <your-password>

After both hosts have been rebooted, your storage backend is online, offering a GNBD device on the virtual IP configured in /etc/ha.d/haresources.