Testing/CI/KubernetesRunners

From QEMU

To be able to run Gitlab CI jobs on a Kubernetes cluster, a Gitlab Runner must be installed [1].

Deployment

This sections documents the steps taken to deploy a GitLab Runner instance on a Azure Kubernetes cluster by using Helm [2].

Kubernetes Cluster

Create a Kubernetes cluster on Azure (AKS). Two node pools: "agentpool" for the Kubernetes system pods and "jobs" for the CI jobs.

CLI

Follow the docs to Install the Azure CLI.

Alternatively, run the Azure CLI in a container [3]:

podman run -it mcr.microsoft.com/azure-cli

Install the Kubernetes CLI (kubectl) [4]:

az aks install-cli

Install the Helm CLI [5]:

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

Sign in

Sign in to Azure [6]:

az login

Connect to your Kubernetes Cluster. Open the Azure web dashboard for your cluster and push the "Connect" button. A list of commands will be displayed to connect to your cluster. Something like the following:

az account set --subscription ...
az aks get-credentials ...

Gitlab

Register the new runner [7].

Gitlab Runner

Now it's time to install the Gitlab runner with Helm [8].

Add the GitLab Helm repository:

helm repo add gitlab https://charts.gitlab.io

Create a namespace:

kubectl create namespace "gitlab-runner"

Create a values.yaml file for your runner configuration [9].

The current values.yaml file can be found in QEMU main repository: scripts/ci/gitlab-kubernetes-runners/values.yaml

The default poll_timeout value needs to be raised to have time for auto-scaling nodes to start. [10]

Enabling RBAC support [11] seems to be needed [12] with the default AKS configuration.

gitlabUrl: "https://gitlab.com/"
runnerRegistrationToken: ""
rbac:
  create: true
# Configure the maximum number of concurrent jobs
concurrent: 200
# Schedule runners on "user" nodes (not "system")
runners:
  config: |
    [[runners]]
      [runners.kubernetes]
        poll_timeout = "1200" 
      [runners.kubernetes.node_selector]
        "kubernetes.azure.com/mode" = "user"

Deploy the runner:

helm install --namespace gitlab-runner runner-manager -f values.yaml gitlab/gitlab-runner

If you change the configuration in values.yaml, apply it with the command below. Pause your runner before upgrading it to avoid service disruptions. [13]

helm upgrade --namespace gitlab-runner runner-manager -f values.yaml gitlab/gitlab-runner

Docker

QEMU jobs require Docker-in-Docker. Additional configuration is necessary. [14]

Docker-in-Docker makes the CI environment less secure [15], it needs more resources and it has known issues. Please migrate your Docker jobs to better alternatives if you can [16].

Add the following to your values.yaml:

runners:
  config: |
    [[runners]]
      [runners.kubernetes]
        image = "ubuntu:20.04"
        privileged = true
      [[runners.kubernetes.volumes.empty_dir]]
        name = "docker-certs"
        mount_path = "/certs/client"
        medium = "Memory"

Update your job definitions to use the following. Alternatively, variables can be set using the runner environment configuration [17].

image: docker:20.10.16
services:
  - docker:20.10.16-dind
variables:
  DOCKER_HOST: tcp://docker:2376
  DOCKER_TLS_CERTDIR: "/certs"
  DOCKER_TLS_VERIFY: 1
  DOCKER_CERT_PATH: "$DOCKER_TLS_CERTDIR/client"
before_script:
  - until docker info; do sleep 1; done

Resource Management

The QEMU pipeline has around 100 jobs. Most of them can run in parallel. Each job needs enough resources to complete before it times out.

To understand Kubernetes resource measure units, see Resource Management for Pods and Containers.

Set requests:

runners:
  config: |
    [[runners]]
      [runners.kubernetes]
        cpu_request = "0.5"      
        service_cpu_request = "0.5"
        helper_cpu_request = "0.25"     

Jobs that have higher requirements should set their own variables. See overwrite container resources. To allow single jobs to request more resources, you have to set the _overwrite_max_allowed variables. See [18].

runners:
  config: |
    [[runners]]
      [runners.kubernetes]
        cpu_request_overwrite_max_allowed = "7"        
        memory_request_overwrite_max_allowed = "30Gi"