Features/COLO/Managed HOWTO: Difference between revisions

From QEMU
(Created page with "On every node do the following: Install debian buster amd64 https://www.debian.org/distrib/ $ = run as user # = run as root Install packages: # apt-get -y install git bu...")
 
No edit summary
Line 1: Line 1:
== Overview ==
This is a step-by-step guide to install qemu-colo, configure a pacemaker cluster and configure and run a qemu-colo cluster resource. This is just a minimal pacemaker setup and should not be used in production. For more information about pacemaker configuration, look at the [https://clusterlabs.org/pacemaker/doc/ pacemaker] and [https://manpages.debian.org/buster/corosync/corosync.conf.5.en.html corosync] documentation.
It's assumed that you have two cluster nodes with the following ip's:
test-cluster-01 192.168.220.244
test-cluster-02 192.168.220.245
== Setup ==
$ = run as normal user
# = run as root
On every node do the following:
On every node do the following:


Install debian buster amd64
Install debian buster amd64
https://www.debian.org/distrib/
https://www.debian.org/distrib/
$ = run as user
# = run as root


Install packages:
Install packages:
  # apt-get -y install git build-essential wget nano bridge-utils corosync pacemaker crmsh python3 pkg-config libglib2.0-dev libpixman-1-dev
  # apt-get -y install git build-essential wget nano bridge-utils corosync pacemaker crmsh python3 pkg-config libglib2.0-dev libpixman-1-dev


Workaround:
Workaround for a [https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=960271 bug]:
  # wget https://snapshot.debian.org/archive/debian/20200129T091834Z/pool/main/l/linux/linux-libc-dev_4.19.98-1_amd64.deb
  # wget https://snapshot.debian.org/archive/debian/20200129T091834Z/pool/main/l/linux/linux-libc-dev_4.19.98-1_amd64.deb
  # dpkg -i linux-libc-dev_4.19.98-1_amd64.deb  
  # dpkg -i linux-libc-dev_4.19.98-1_amd64.deb  
Line 22: Line 30:


Configure networking:
Configure networking:
test-cluster-01 = 192.168.220.244
test-cluster-02 = 192.168.220.245


# cat > /etc/network/interfaces <<EOF
Replace <code>/etc/network/interfaces</code> with the following. Adjust <code>eth0</code> and the ip address as needed for the node.
  auto lo
  auto lo
  iface lo inet loopback
  iface lo inet loopback
Line 38: Line 44:
   netmask 255.255.255.0
   netmask 255.255.255.0
   gateway 192.168.220.1
   gateway 192.168.220.1
EOF


# cat > /etc/resolv.conf <<EOF
Configure your dns server in <code>/etc/resolv.conf</code>:
  nameserver 192.168.220.1
  nameserver 192.168.220.1
EOF


Apply changes:
  # ifdown eth0
  # ifdown eth0
  # ifup br0
  # ifup br0


Configure DNS:
Configure local dns:
# cat > /etc/hosts <<'EOF'
 
Replace <code>/etc/hosts</code> on <code>test-cluster-01</code> with the following:
  127.0.0.1      localhost
  127.0.0.1      localhost
  127.0.1.1      test-cluster-01.home.intra  test-cluster-01
  127.0.1.1      test-cluster-01.home.intra  test-cluster-01
Line 59: Line 64:
   
   
  192.168.220.245 test-cluster-02.home.intra  test-cluster-02
  192.168.220.245 test-cluster-02.home.intra  test-cluster-02
EOF


# cat > /etc/hosts.augnew <<'EOF'
Replace <code>/etc/hosts</code> on <code>test-cluster-02</code> with the following:
  127.0.0.1      localhost
  127.0.0.1      localhost
  127.0.1.1      test-cluster-02.home.intra  test-cluster-02
  127.0.1.1      test-cluster-02.home.intra  test-cluster-02
Line 71: Line 75:
   
   
  192.168.220.244 test-cluster-01.home.intra  test-cluster-01
  192.168.220.244 test-cluster-01.home.intra  test-cluster-01
EOF




Configure corosync:
Configure corosync:
# cat > /etc/corosync/corosync.conf <<'EOF'
 
Replace <code>/etc/corosync/corosync.conf</code> with the following:
  # Please read the corosync.conf.5 manual page
  # Please read the corosync.conf.5 manual page
  totem {
  totem {
Line 133: Line 137:
         }
         }
  }
  }
EOF


Apply changes:
  # systemctl enable corosync
  # systemctl enable corosync
  # systemctl restart corosync
  # systemctl restart corosync
# systemctl enable pacemaker
  # systemctl restart pacemaker
  # systemctl restart pacemaker


Configure a qemu-colo cluster resource:
== Configure a qemu-colo cluster resource ==
Create images on all nodes:
# qemu-img create -f qcow2 /mnt/vms/vma.qcow2 10g
 
Show user guide of the resource agent for explanation of parameters and more:
  # crm ra info ocf:qemu:colo
  # crm ra info ocf:qemu:colo


# qemu-img create -f qcow2 /mnt/vms/vma.qcow2 10g
Configure the resource (on one node only):
 
  # crm configure primitive vma ocf:qemu:colo \
  # crm configure primitive vma ocf:qemu:colo \
         meta target-role=Stopped \
         meta target-role=Stopped \
Line 155: Line 163:
         op promote timeout=30s interval=0 \
         op promote timeout=30s interval=0 \
         op demote timeout=120s interval=0
         op demote timeout=120s interval=0
  # crm configure clone vma_ms vma \
  # crm configure clone vma_ms vma \
meta promotable=true clone-max=2 promoted-max=1 notify=true target-role=Started
        meta promotable=true clone-max=2 promoted-max=1 notify=true target-role=Started
# crm_master -r vma -v 10


Show cluster status:
  # crm_mon
  # crm_mon
  # journalctl -e
The resource should be 'Master' on one node and 'Slave' on the other
 
For detailed error messages and resync status, look at the system log:
  # journalctl -f

Revision as of 18:32, 6 June 2020

Overview

This is a step-by-step guide to install qemu-colo, configure a pacemaker cluster and configure and run a qemu-colo cluster resource. This is just a minimal pacemaker setup and should not be used in production. For more information about pacemaker configuration, look at the pacemaker and corosync documentation.

It's assumed that you have two cluster nodes with the following ip's:

test-cluster-01 192.168.220.244
test-cluster-02 192.168.220.245

Setup

$ = run as normal user
# = run as root

On every node do the following:

Install debian buster amd64 https://www.debian.org/distrib/

Install packages:

# apt-get -y install git build-essential wget nano bridge-utils corosync pacemaker crmsh python3 pkg-config libglib2.0-dev libpixman-1-dev

Workaround for a bug:

# wget https://snapshot.debian.org/archive/debian/20200129T091834Z/pool/main/l/linux/linux-libc-dev_4.19.98-1_amd64.deb
# dpkg -i linux-libc-dev_4.19.98-1_amd64.deb 

Install qemu:

$ git clone --single-branch --depth 1 -b new_build https://github.com/Lukey3332/qemu.git
$ cd qemu
$ ./configure --target-list=x86_64-softmmu,i386-softmmu --enable-replication --enable-colo-ra --enable-kvm --prefix=/usr
$ make -j4; make
# make install

Configure networking:

Replace /etc/network/interfaces with the following. Adjust eth0 and the ip address as needed for the node.

auto lo
iface lo inet loopback

iface eth0 inet manual

auto br0
iface br0 inet static
 mtu 1500
 bridge_ports eth0
 address 192.168.220.244
 netmask 255.255.255.0
 gateway 192.168.220.1

Configure your dns server in /etc/resolv.conf:

nameserver 192.168.220.1

Apply changes:

# ifdown eth0
# ifup br0

Configure local dns:

Replace /etc/hosts on test-cluster-01 with the following:

127.0.0.1       localhost
127.0.1.1       test-cluster-01.home.intra  test-cluster-01

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

192.168.220.245 test-cluster-02.home.intra  test-cluster-02

Replace /etc/hosts on test-cluster-02 with the following:

127.0.0.1       localhost
127.0.1.1       test-cluster-02.home.intra  test-cluster-02

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

192.168.220.244 test-cluster-01.home.intra  test-cluster-01


Configure corosync:

Replace /etc/corosync/corosync.conf with the following:

# Please read the corosync.conf.5 manual page
totem {
        version: 2

        cluster_name: test-cluster
}

logging {
        # Log the source file and line where messages are being
        # generated. When in doubt, leave off. Potentially useful for
        # debugging.
        fileline: off
        # Log to standard error. When in doubt, set to yes. Useful when
        # running in the foreground (when invoking "corosync -f")
        to_stderr: yes
        # Log to a log file. When set to "no", the "logfile" option
        # must not be set.
        to_logfile: yes
        logfile: /var/log/corosync/corosync.log
        # Log to the system log daemon. When in doubt, set to yes.
        to_syslog: yes
        # Log debug messages (very verbose). When in doubt, leave off.
        debug: off
        # Log messages with time stamps. When in doubt, set to hires (or on)
        #timestamp: hires
        logger_subsys {
                subsys: QUORUM
                debug: off
        }
}

quorum {
        # Enable and configure quorum subsystem (default: off)
        # see also corosync.conf.5 and votequorum.5
        provider: corosync_votequorum
        two_node: 1
}

nodelist {

        node {
                # Hostname of the node
                name: test-cluster-01
                # Cluster membership node identifier
                nodeid: 1

                ring0_addr: 192.168.220.244
        }
        node {
                # Hostname of the node
                name: test-cluster-02
                # Cluster membership node identifier
                nodeid: 2

                ring0_addr: 192.168.220.245
        }
}

Apply changes:

# systemctl enable corosync
# systemctl restart corosync
# systemctl enable pacemaker
# systemctl restart pacemaker

Configure a qemu-colo cluster resource

Create images on all nodes:

# qemu-img create -f qcow2 /mnt/vms/vma.qcow2 10g

Show user guide of the resource agent for explanation of parameters and more:

# crm ra info ocf:qemu:colo

Configure the resource (on one node only):

# crm configure primitive vma ocf:qemu:colo \
       meta target-role=Stopped \
       params active_hidden_dir="/mnt/vms" \
       options="-vnc :0 -enable-kvm -cpu qemu64,+kvmclock -m 512 -netdev bridge,br=br0,id=hn0 -device e1000,netdev=hn0 -device virtio-blk,drive=colo-disk0 -drive if=none,node-name=parent0,format=qcow2,file=/mnt/vms/vma.qcow2" \
       op start timeout=30s interval=0 \
       op stop timeout=10s interval=0 \
       op monitor role=Master interval=1000ms timeout=30s \
       op monitor role=Slave interval=1001ms timeout=30s \
       op notify timeout=30s interval=0 \
       op promote timeout=30s interval=0 \
       op demote timeout=120s interval=0

# crm configure clone vma_ms vma \
       meta promotable=true clone-max=2 promoted-max=1 notify=true target-role=Started

# crm_master -r vma -v 10

Show cluster status:

# crm_mon

The resource should be 'Master' on one node and 'Slave' on the other

For detailed error messages and resync status, look at the system log:

# journalctl -f