The storage backend consists of two hosts. Both hosts are using DRBD in order to synchronize a block device. Between these hosts heartbeat is running serving all cluster nodes on a virtual ethernet interface. There is always one active and one passive host. If the active host fails, heartbeat automatically switches over to the secondary host and resumes serving the storage backend again.
The storage backend exports a cluster block device using GNBD. On this device, OCFS2 is used as a cluster file system. Every cluster node imports this GNBD device and mounts it. On this cluster file system all Xen virtual machines reside as single files.
The setup of the storage backend consists of the following steps:
Setup LVM
Setup DRBD
Setup Redhat Cluster Suite (GNBD)
Setup Heartbeat
On both storage nodes, a local volume has to be setup in order to be used by DRBD. The volume should be the same size on both nodes.
apt-get install clvmd pvcreate /dev/hda3 vgcreate vg1 lvcreate -L 10G -n cluster vg1
In order to use DRBD together with the Xen kernel, both the DRBD kernel module and the tools need to be compiled by the lastest stable release source. The available packages provided by Debian do not successfully compile against the Xen kernel. This worked flawlessly:
wget http://oss.linbit.com/drbd/0.7/drbd-0.7.19.tar.gz tar xvzf drbd-0.7.19.tar.gz cd drbd-0.7.19 make KDIR=/usr/src/xen-3.0.2-2/linux-2.6.16-xen make install make tools make install-tools
The first step of configuring DRBD is to create the configuration file which is located under /etc/drbd.conf
:
resource cluster { protocol C; incon-degr-cmd "echo '!DRBD! pri on incon-degr' | wall ;\ sleep 60; halt -f" startup { degr-wfc-timeout 120; } disk { on-io-error detach; } net {} syncer { rate 5M; group 1; } on xenamo4 { device /dev/drbd0; disk /dev/vg1/cluster; address 10.0.0.4:7788; meta-disk internal; } on xenamo5 { device /dev/drbd0; disk /dev/vg1/cluster; address 10.0.0.5:7788; meta-disk internal; } }
Afterwards, the DRBD service has to be started on both nodes:
drbdadm up all
The first time you start up the devices, they both start in secondary mode. On one host, you must force the device to become primary:
drbdadm -- --do-what-I-say primary all
Now, DRBD starts to synchronize the hosts. You can follow the process by executing the following command:
cat /proc/drbd
As soon as synchronizing has finished, the output of the above command looks like this:
version: 0.7.19 (api:78/proto:74) SVN Revision: 2212 build by root@xenamo4, 2006-05-28 10:20:40 0: cs:Connected st:Secondary/Primary ld:Consistent ns:0 nr:72451 dw:72451 dr:0 al:0 bm:128 lo:0 pe:0 ua:0 ap:0
Because of the same reason as with DRBD, we also need to compile the Redhat cluster suite from sources:
apt-get install libsysfs-dev libxml2-dev wget ftp://sources.redhat.com/pub/cluster/releases/\ cluster-1.02.00.tar.gz tar xvzf cluster-1.02.00.tar.gz cd cluster-1.02.00 ./configure --kernel_src=/usr/src/xen-3.0.2-2/linux-2.6.16-xen make install
Afterwards, all needed kernel modules and tools are available. Next step is to configure the cluster. Therefore, the file /etc/cluster/cluster.conf
needs to be created:
<?xml version="1.0"?> <cluster name="xencluster" config_version="1"> <cman> </cman> <clusternodes> <clusternode name="xenamo1"> <fence> <method name="single"> <device name="human" nodename="xenamo1"/> </method> </fence> </clusternode> <clusternode name="xenamo2"> <fence> <method name="single"> <device name="human" nodename="xenamo2"/> </method> </fence> </clusternode> <clusternode name="xenamo3"> <fence> <method name="single"> <device name="human" nodename="xenamo3"/> </method> </fence> </clusternode> <clusternode name="xenamo4"> <fence> <method name="single"> <device name="human" nodename="xenamo4"/> </method> </fence> </clusternode> <clusternode name="xenamo5"> <fence> <method name="single"> <device name="human" nodename="xenamo5"/> </method> </fence> </clusternode> </clusternodes> <fencedevices> <fencedevice name="human" agent="fence_manual"/> </fencedevices> </cluster>
In order to start the cluster automatically at startup, the following files need to be edited:
Add the following modules to /etc/modules
:
dm-mod gfs lock_dlm
* Create a start/stop script for the cluster LVM daemon at /etc/init.d/clvmd
:
#! /bin/sh # # clvmd Start/Stop script for the cluster LVM daemon # # Author: Daniel Bertolo <dbertolo@hsr.ch>. # # Version: @(#)clvmd 1.00 25-Jun-2006 dbertolo@hsr.ch # set -e PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin DESC="The cluster LVM daemon" NAME=clvmd DAEMON=/sbin/$NAME PIDFILE=/var/run/$NAME.pid SCRIPTNAME=/etc/init.d/$NAME # Gracefully exit if the package has been removed. test -x $DAEMON || exit 0 # # Function that starts the daemon/service. # d_start() { start-stop-daemon --start --quiet --pidfile $PIDFILE \ --exec $DAEMON } # # Function that stops the daemon/service. # d_stop() { start-stop-daemon --stop --quiet --pidfile $PIDFILE \ --name $NAME } # # Function that sends a SIGHUP to the daemon/service. # d_reload() { start-stop-daemon --stop --quiet --pidfile $PIDFILE \ --name $NAME --signal 1 } case "$1" in start) echo -n "Starting $DESC: $NAME" d_start echo "." ;; stop) echo -n "Stopping $DESC: $NAME" d_stop echo "." ;; restart|force-reload) # # If the "reload" option is implemented, move the "force-reload" # option to the "reload" entry above. If not, "force-reload" is # just the same as "restart". # echo -n "Restarting $DESC: $NAME" d_stop sleep 1 d_start echo "." ;; *) # echo "Usage: $SCRIPTNAME {start|stop|restart|reload|force-reload}" >&2 echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2 exit 1 ;; esac exit 0
In order to provide the cluster name to the startup script, the file /etc/sysconfig/cluster
needs to be created with the following content:
CCSD_OPTS="" CLUSTERNAME="xencluster"
As the startup scripts generated by the Redhat cluster suite do fit to Redhat based distributions, you need to manually edit the following files in order to use them. Comment out every line using one of the commands status
, success
or failure
and the line containing . /etc/init.d/functions
. Like this, no output is generated, but the scripts work.
/etc/init.d/ccsd
/etc/init.d/cman
/etc/init.d/fenced
The last step of configuring the cluster suite is to add appropriate symlinks to the runlevels 0, 2 and 6 in order to start and stop the cluster node on boot and shutdown. The services must be started in the following order and after DRBD:
update-rc.d ccsd defaults 71 52 update-rc.d cman defaults 72 51 update-rc.d fenced defaults 73 50 update-rc.d clvmd defaults 74 49
These links must be set in the runlevels 0 (halt) and 6 (reboot) in reverse order and before the DRBD service is stopped.
The last service needed in order to run the storage backend is heartbeat:
apt-get install heartbeat
The configuration of heartbeat consists of the files described below. All these files need to be available on both hosts.
The file /etc/ha.d/ha.cf
defines the other host heartbeat shall control. In order not to rely on the network link only, a null-modem cable was used to connect the two hosts. Like this, heartbeat knows, whetever only the network link is down or the other host has gone. The IP address in this file is the address of the other host:
keepalive 1 deadtime 10 warntime 5 initdead 60 udpport 694 baud 19200 serial /dev/ttyS0 ucast eth0 10.0.0.5 auto_failback off watchdog /dev/watchdog node xenamo4 node xenamo5
The File /etc/ha.d/haresources
defines a virtual network interface which shall be active on one of the hosts and which controls DRBD:
xenamo4 10.0.0.6/24 drbddisk::cluster \ IPaddr::10.0.0.6/24/eth0
The file /etc/ha.d/resources/drbddisk
was created when DRBD was installed. But it has to be adapted in order to control the GNBD exports, too. Change the start section to look like this:
start) # try several times, in case heartbeat deadtime # was smaller than drbd ping time try=6 while true; do $DRBDADM primary $RES && break let "--try" || exit 20 sleep 1 done vgchange -aly gnbd_serv gnbd_export -e ocfscluster -d /dev/drbd0 ;;
This exports the DRBD device using GNBD automatically as soon as this host becomes primary. The stop section has to be adapted accordingly:
stop) # exec, so the exit code of drbdadm propagates exec $DRBDADM secondary $RES gnbd_export -R gnbd_serv -k vgchange -aln ;;
The last file that needs to be adapted is /etc/ha.d/authkeys
, which defines a password on both nodes, in order to secure the connection:
auth 2 1 crc 2 sha1 <your-password> 3 md5 <your-password>
After both hosts have been rebooted, your storage backend is online, offering a GNBD device on the virtual IP configured in /etc/ha.d/haresources
.