Features/COLO/Managed HOWTO
Overview
This is a step-by-step guide to install qemu-colo, configure a pacemaker cluster and configure and run a qemu-colo cluster resource. This is just a minimal pacemaker setup and should not be used in production. For more information about pacemaker configuration, look at the pacemaker and corosync documentation.
It's assumed that you have two cluster nodes with the following ip's:
test-cluster-01 192.168.220.244 test-cluster-02 192.168.220.245
Setup
$ = run as normal user # = run as root
On every node do the following:
Install debian buster amd64 https://www.debian.org/distrib/
Install packages:
# apt-get -y install git build-essential wget nano bridge-utils corosync pacemaker crmsh python3 pkg-config libglib2.0-dev libpixman-1-dev
Workaround for a bug:
# wget https://snapshot.debian.org/archive/debian/20200129T091834Z/pool/main/l/linux/linux-libc-dev_4.19.98-1_amd64.deb # dpkg -i linux-libc-dev_4.19.98-1_amd64.deb
Install qemu:
$ git clone --single-branch --depth 1 -b new_build https://github.com/Lukey3332/qemu.git $ cd qemu $ ./configure --target-list=x86_64-softmmu,i386-softmmu --enable-replication --enable-colo-ra --enable-kvm --prefix=/usr $ make -j4; make # make install
Configure networking:
Replace /etc/network/interfaces
with the following. Adjust eth0
and the ip address as needed for the node.
auto lo iface lo inet loopback iface eth0 inet manual auto br0 iface br0 inet static mtu 1500 bridge_ports eth0 address 192.168.220.244 netmask 255.255.255.0 gateway 192.168.220.1
Configure your dns server in /etc/resolv.conf
:
nameserver 192.168.220.1
Apply changes:
# ifdown eth0 # ifup br0
Configure local dns:
Replace /etc/hosts
on test-cluster-01
with the following:
127.0.0.1 localhost 127.0.1.1 test-cluster-01.home.intra test-cluster-01 # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters 192.168.220.245 test-cluster-02.home.intra test-cluster-02
Replace /etc/hosts
on test-cluster-02
with the following:
127.0.0.1 localhost 127.0.1.1 test-cluster-02.home.intra test-cluster-02 # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters 192.168.220.244 test-cluster-01.home.intra test-cluster-01
Configure corosync:
Replace /etc/corosync/corosync.conf
with the following:
# Please read the corosync.conf.5 manual page totem { version: 2 cluster_name: test-cluster } logging { # Log the source file and line where messages are being # generated. When in doubt, leave off. Potentially useful for # debugging. fileline: off # Log to standard error. When in doubt, set to yes. Useful when # running in the foreground (when invoking "corosync -f") to_stderr: yes # Log to a log file. When set to "no", the "logfile" option # must not be set. to_logfile: yes logfile: /var/log/corosync/corosync.log # Log to the system log daemon. When in doubt, set to yes. to_syslog: yes # Log debug messages (very verbose). When in doubt, leave off. debug: off # Log messages with time stamps. When in doubt, set to hires (or on) #timestamp: hires logger_subsys { subsys: QUORUM debug: off } } quorum { # Enable and configure quorum subsystem (default: off) # see also corosync.conf.5 and votequorum.5 provider: corosync_votequorum two_node: 1 wait_for_all: 1 } nodelist { node { # Hostname of the node name: test-cluster-01 # Cluster membership node identifier nodeid: 1 ring0_addr: 192.168.220.244 } node { # Hostname of the node name: test-cluster-02 # Cluster membership node identifier nodeid: 2 ring0_addr: 192.168.220.245 } }
Apply changes:
# systemctl enable corosync # systemctl restart corosync # systemctl enable pacemaker # systemctl restart pacemaker
Configure a qemu-colo cluster resource
Create images on all nodes:
# qemu-img create -f qcow2 /mnt/vms/vma.qcow2 10g
Show user guide of the resource agent for explanation of parameters and more:
# crm ra info ocf:qemu:colo
Configure the resource (on one node only):
# crm configure primitive vma ocf:qemu:colo \ meta target-role=Stopped \ params active_hidden_dir="/mnt/vms" \ options="-vnc :0 -enable-kvm -cpu qemu64,+kvmclock -m 512 -netdev bridge,br=br0,id=hn0 -device e1000,netdev=hn0 -device virtio-blk,drive=colo-disk0 -drive if=none,node-name=parent0,format=qcow2,file=/mnt/vms/vma.qcow2" \ op start timeout=30s interval=0 \ op stop timeout=10s interval=0 \ op monitor role=Master interval=1000ms timeout=30s \ op monitor role=Slave interval=1001ms timeout=30s \ op notify timeout=30s interval=0 \ op promote timeout=30s interval=0 \ op demote timeout=120s interval=0 # crm configure clone vma_ms vma \ meta promotable=true clone-max=2 promoted-max=1 notify=true target-role=Started # crm_master -r vma -v 10
Show cluster status:
# crm_mon
The resource should be 'Master' on one node and 'Slave' on the other
For detailed error messages and resync status, look at the system log:
# journalctl -f