Features/COLO/Managed HOWTO

Overview

This is a step-by-step guide to install qemu-colo, configure a pacemaker cluster and configure and run a qemu-colo cluster resource. This is just a minimal pacemaker setup and should not be used in production. For more information about pacemaker configuration, look at the pacemaker and corosync documentation.

It's assumed that you have two cluster nodes with the following ip's:

test-cluster-01 192.168.220.244
test-cluster-02 192.168.220.245

Setup

$ = run as normal user
# = run as root

On every node do the following:

Install debian buster amd64 https://www.debian.org/distrib/

Install packages:

# apt-get -y install git build-essential wget nano bridge-utils corosync pacemaker crmsh python3 pkg-config libglib2.0-dev libpixman-1-dev

Workaround for a bug:

# wget https://snapshot.debian.org/archive/debian/20200129T091834Z/pool/main/l/linux/linux-libc-dev_4.19.98-1_amd64.deb
# dpkg -i linux-libc-dev_4.19.98-1_amd64.deb

Install qemu:

$ git clone --single-branch --depth 1 -b new_build https://github.com/Lukey3332/qemu.git
$ cd qemu
$ ./configure --target-list=x86_64-softmmu,i386-softmmu --enable-replication --enable-colo-ra --enable-kvm --prefix=/usr
$ make -j4; make
# make install

Configure networking:

Replace /etc/network/interfaces with the following. Adjust eth0 and the ip address as needed for the node.

auto lo
iface lo inet loopback

iface eth0 inet manual

auto br0
iface br0 inet static
 mtu 1500
 bridge_ports eth0
 address 192.168.220.244
 netmask 255.255.255.0
 gateway 192.168.220.1

Configure your dns server in /etc/resolv.conf:

nameserver 192.168.220.1

Apply changes:

# ifdown eth0
# ifup br0

Configure local dns:

Replace /etc/hosts on test-cluster-01 with the following:

127.0.0.1       localhost
127.0.1.1       test-cluster-01.home.intra  test-cluster-01

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

192.168.220.245 test-cluster-02.home.intra  test-cluster-02

Replace /etc/hosts on test-cluster-02 with the following:

127.0.0.1       localhost
127.0.1.1       test-cluster-02.home.intra  test-cluster-02

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

192.168.220.244 test-cluster-01.home.intra  test-cluster-01

Configure corosync:

Replace /etc/corosync/corosync.conf with the following:

# Please read the corosync.conf.5 manual page
totem {
        version: 2

        cluster_name: test-cluster
}

logging {
        # Log the source file and line where messages are being
        # generated. When in doubt, leave off. Potentially useful for
        # debugging.
        fileline: off
        # Log to standard error. When in doubt, set to yes. Useful when
        # running in the foreground (when invoking "corosync -f")
        to_stderr: yes
        # Log to a log file. When set to "no", the "logfile" option
        # must not be set.
        to_logfile: yes
        logfile: /var/log/corosync/corosync.log
        # Log to the system log daemon. When in doubt, set to yes.
        to_syslog: yes
        # Log debug messages (very verbose). When in doubt, leave off.
        debug: off
        # Log messages with time stamps. When in doubt, set to hires (or on)
        #timestamp: hires
        logger_subsys {
                subsys: QUORUM
                debug: off
        }
}

quorum {
        # Enable and configure quorum subsystem (default: off)
        # see also corosync.conf.5 and votequorum.5
        provider: corosync_votequorum
        two_node: 1
}

nodelist {

        node {
                # Hostname of the node
                name: test-cluster-01
                # Cluster membership node identifier
                nodeid: 1

                ring0_addr: 192.168.220.244
        }
        node {
                # Hostname of the node
                name: test-cluster-02
                # Cluster membership node identifier
                nodeid: 2

                ring0_addr: 192.168.220.245
        }
}

Apply changes:

# systemctl enable corosync
# systemctl restart corosync
# systemctl enable pacemaker
# systemctl restart pacemaker

Configure a qemu-colo cluster resource

Create images on all nodes:

# qemu-img create -f qcow2 /mnt/vms/vma.qcow2 10g

Show user guide of the resource agent for explanation of parameters and more:

# crm ra info ocf:qemu:colo

Configure the resource (on one node only):

# crm configure primitive vma ocf:qemu:colo \
       meta target-role=Stopped \
       params active_hidden_dir="/mnt/vms" \
       options="-vnc :0 -enable-kvm -cpu qemu64,+kvmclock -m 512 -netdev bridge,br=br0,id=hn0 -device e1000,netdev=hn0 -device virtio-blk,drive=colo-disk0 -drive if=none,node-name=parent0,format=qcow2,file=/mnt/vms/vma.qcow2" \
       op start timeout=30s interval=0 \
       op stop timeout=10s interval=0 \
       op monitor role=Master interval=1000ms timeout=30s \
       op monitor role=Slave interval=1001ms timeout=30s \
       op notify timeout=30s interval=0 \
       op promote timeout=30s interval=0 \
       op demote timeout=120s interval=0

# crm configure clone vma_ms vma \
       meta promotable=true clone-max=2 promoted-max=1 notify=true target-role=Started

# crm_master -r vma -v 10

Show cluster status:

# crm_mon

The resource should be 'Master' on one node and 'Slave' on the other

For detailed error messages and resync status, look at the system log:

# journalctl -f