Testing/CI/KubernetesRunners: Difference between revisions

From QEMU
No edit summary
(Add link to values.yaml)
 
(27 intermediate revisions by the same user not shown)
Line 1: Line 1:
To be able to run Gitlab CI jobs on a Kubernetes cluster, a Gitlab Runner must be installed [https://docs.gitlab.com/runner/install/].
To be able to run Gitlab CI jobs on a Kubernetes cluster, a Gitlab Runner must be installed [https://docs.gitlab.com/runner/install/].


This page documents the steps taken to deploy a GitLab Runner instance on a Azure Kubernetes cluster by using Helm [https://docs.gitlab.com/runner/install/kubernetes.html].
== Deployment ==
This sections documents the steps taken to deploy a GitLab Runner instance on a Azure Kubernetes cluster by using Helm [https://docs.gitlab.com/runner/install/kubernetes.html].


=== Kubernetes Cluster ===
=== Kubernetes Cluster ===
Create a Kubernetes cluster on Azure (AKS). Single node pool "agentpool" for the Kubernetes system pods.
Create a Kubernetes cluster on Azure (AKS).
Enable virtual nodes [https://learn.microsoft.com/en-us/azure/aks/virtual-nodes] to have on-demand capacity for the CI workloads.
Two node pools: "agentpool" for the Kubernetes system pods and "jobs" for the CI jobs.


=== CLI ===
=== CLI ===
Line 22: Line 23:
  curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
  curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash


=== Sign in ===
Sign in to Azure [https://learn.microsoft.com/en-us/cli/azure/authenticate-azure-cli]:
Sign in to Azure [https://learn.microsoft.com/en-us/cli/azure/authenticate-azure-cli]:


Line 32: Line 34:
  az account set --subscription ...
  az account set --subscription ...
  az aks get-credentials ...
  az aks get-credentials ...
=== Gitlab ===
Register the new runner [https://docs.gitlab.com/runner/register/index.html].


=== Gitlab Runner ===
=== Gitlab Runner ===
Now it's time to install the Gitlab runner with Helm [https://docs.gitlab.com/runner/install/kubernetes.html#installing-gitlab-runner-using-the-helm-chart].
Now it's time to install the Gitlab runner with Helm [https://docs.gitlab.com/runner/install/kubernetes.html#installing-gitlab-runner-using-the-helm-chart].
Add the GitLab Helm repository:
helm repo add gitlab https://charts.gitlab.io


Create a namespace:
Create a namespace:
Line 40: Line 49:
  kubectl create namespace "gitlab-runner"
  kubectl create namespace "gitlab-runner"


Create a <code>values.yaml</code> file for your runner configuration [https://docs.gitlab.com/runner/install/kubernetes.html#configuring-gitlab-runner-using-the-helm-chart]:
Create a <code>values.yaml</code> file for your runner configuration [https://docs.gitlab.com/runner/install/kubernetes.html#configuring-gitlab-runner-using-the-helm-chart].


gitlabUrl: "https://gitlab.com/"
The current <code>values.yaml</code> file can be found in QEMU main repository: [https://gitlab.com/qemu-project/qemu/-/blob/master/scripts/ci/gitlab-kubernetes-runners/values.yaml scripts/ci/gitlab-kubernetes-runners/values.yaml]
runnerRegistrationToken: ""
 
The default <code>poll_timeout</code> value needs to be raised to have time for auto-scaling nodes to start.
[https://docs.gitlab.com/runner/executors/kubernetes.html#job-failed-system-failure-timed-out-waiting-for-pod-to-start]
 
Enabling RBAC support [https://docs.gitlab.com/runner/install/kubernetes.html#enabling-rbac-support]
seems to be needed [https://docs.gitlab.com/runner/install/kubernetes.html#error-job-failed-system-failure-secrets-is-forbidden]
with the default AKS configuration.
 
<pre>
gitlabUrl: "https://gitlab.com/"
runnerRegistrationToken: ""
rbac:
  create: true
# Configure the maximum number of concurrent jobs
concurrent: 200
# Schedule runners on "user" nodes (not "system")
runners:
  config: |
    [[runners]]
      [runners.kubernetes]
        poll_timeout = "1200"
      [runners.kubernetes.node_selector]
        "kubernetes.azure.com/mode" = "user"
</pre>


Deploy the runner:
Deploy the runner:


  helm install --namespace gitlab-runner gitlab-runner -f values.yaml gitlab/gitlab-runner
  helm install --namespace gitlab-runner runner-manager -f values.yaml gitlab/gitlab-runner
 
If you change the configuration in <code>values.yaml</code>, apply it with the command below. Pause your runner before upgrading it to avoid service disruptions. [https://docs.gitlab.com/runner/install/kubernetes.html#upgrading-gitlab-runner-using-the-helm-chart]
 
helm upgrade --namespace gitlab-runner runner-manager -f values.yaml gitlab/gitlab-runner
 
=== Docker ===
QEMU jobs require Docker-in-Docker. Additional configuration is necessary. [https://docs.gitlab.com/ee/ci/docker/using_docker_build.html#docker-in-docker-with-tls-enabled-in-kubernetes]


=== Gitlab ===
Docker-in-Docker makes the CI environment less secure [https://docs.gitlab.com/runner/executors/kubernetes.html#using-dockerdind], it needs more resources and it has [https://docs.gitlab.com/ee/ci/docker/using_docker_build.html#known-issues-with-docker-in-docker known issues]. Please migrate your Docker jobs to better alternatives if you can [https://docs.gitlab.com/ee/ci/docker/using_docker_build.html#docker-alternatives].
Follow the docs to [https://docs.gitlab.com/ee/user/clusters/agent/install/index.html#register-the-agent-with-gitlab Register the agent with GitLab].


Make sure you keep the command under "Recommended installation method" (last step).
Add the following to your <code>values.yaml</code>:


=== Gitlab Agent ===
<pre>
Now it's time to install the Gitlab agent with Helm [https://docs.gitlab.com/ee/user/clusters/agent/install/index.html#install-the-agent-with-helm].
runners:
Paste and run the command you got earlier from Gitlab. The "Recommended installation method".
  config: |
    [[runners]]
      [runners.kubernetes]
        image = "ubuntu:20.04"
        privileged = true
      [[runners.kubernetes.volumes.empty_dir]]
        name = "docker-certs"
        mount_path = "/certs/client"
        medium = "Memory"
</pre>


The command you run should look like this:
Update your job definitions to use the following.
Alternatively, variables can be set using the runner <code>environment</code> configuration [https://docs.gitlab.com/runner/configuration/advanced-configuration.html#the-runners-section].


helm repo add gitlab https://charts.gitlab.io
<pre>
helm repo update
image: docker:20.10.16
helm upgrade --install ...
services:
  - docker:20.10.16-dind
variables:
  DOCKER_HOST: tcp://docker:2376
  DOCKER_TLS_CERTDIR: "/certs"
  DOCKER_TLS_VERIFY: 1
  DOCKER_CERT_PATH: "$DOCKER_TLS_CERTDIR/client"
before_script:
  - until docker info; do sleep 1; done
</pre>


Take note of the output of the <code>helm upgrade --install</code> command. It contains the namespace where Helm installed the Gitlab agent.
=== Resource Management ===
The QEMU pipeline has around 100 jobs. Most of them can run in parallel.
Each job needs enough resources to complete before it times out.


=== Verify ===
To understand Kubernetes resource measure units, see [https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ Resource Management for Pods and Containers].
You can check that the agent is running:


kubectl get pods -n <namespace>
Set requests:


The output of <code>kubectl get pods</code> should list the gitlab-agent pod as running:
<pre>
runners:
  config: |
    [[runners]]
      [runners.kubernetes]
        cpu_request = "0.5"     
        service_cpu_request = "0.5"
        helper_cpu_request = "0.25"   
</pre>


NAME                            READY  STATUS    RESTARTS  AGE
Jobs that have higher requirements should set their own variables. See [https://docs.gitlab.com/runner/executors/kubernetes.html#overwrite-container-resources overwrite container resources].
gitlab-agent-566c8b6898-hzk87  1/1    Running  0          0m30s
To allow single jobs to request more resources, you have to set the <code>_overwrite_max_allowed</code> variables. See [https://docs.gitlab.com/runner/executors/kubernetes.html#cpu-requests-and-limits].


Now go back to your project on Gitlab. From the left sidebar, select '''Infrastructure > Kubernetes clusters'''. The agent should show up as '''connected'''.
<pre>
runners:
  config: |
    [[runners]]
      [runners.kubernetes]
        cpu_request_overwrite_max_allowed = "7"       
        memory_request_overwrite_max_allowed = "30Gi"
</pre>

Latest revision as of 14:54, 29 May 2023

To be able to run Gitlab CI jobs on a Kubernetes cluster, a Gitlab Runner must be installed [1].

Deployment

This sections documents the steps taken to deploy a GitLab Runner instance on a Azure Kubernetes cluster by using Helm [2].

Kubernetes Cluster

Create a Kubernetes cluster on Azure (AKS). Two node pools: "agentpool" for the Kubernetes system pods and "jobs" for the CI jobs.

CLI

Follow the docs to Install the Azure CLI.

Alternatively, run the Azure CLI in a container [3]:

podman run -it mcr.microsoft.com/azure-cli

Install the Kubernetes CLI (kubectl) [4]:

az aks install-cli

Install the Helm CLI [5]:

curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

Sign in

Sign in to Azure [6]:

az login

Connect to your Kubernetes Cluster. Open the Azure web dashboard for your cluster and push the "Connect" button. A list of commands will be displayed to connect to your cluster. Something like the following:

az account set --subscription ...
az aks get-credentials ...

Gitlab

Register the new runner [7].

Gitlab Runner

Now it's time to install the Gitlab runner with Helm [8].

Add the GitLab Helm repository:

helm repo add gitlab https://charts.gitlab.io

Create a namespace:

kubectl create namespace "gitlab-runner"

Create a values.yaml file for your runner configuration [9].

The current values.yaml file can be found in QEMU main repository: scripts/ci/gitlab-kubernetes-runners/values.yaml

The default poll_timeout value needs to be raised to have time for auto-scaling nodes to start. [10]

Enabling RBAC support [11] seems to be needed [12] with the default AKS configuration.

gitlabUrl: "https://gitlab.com/"
runnerRegistrationToken: ""
rbac:
  create: true
# Configure the maximum number of concurrent jobs
concurrent: 200
# Schedule runners on "user" nodes (not "system")
runners:
  config: |
    [[runners]]
      [runners.kubernetes]
        poll_timeout = "1200" 
      [runners.kubernetes.node_selector]
        "kubernetes.azure.com/mode" = "user"

Deploy the runner:

helm install --namespace gitlab-runner runner-manager -f values.yaml gitlab/gitlab-runner

If you change the configuration in values.yaml, apply it with the command below. Pause your runner before upgrading it to avoid service disruptions. [13]

helm upgrade --namespace gitlab-runner runner-manager -f values.yaml gitlab/gitlab-runner

Docker

QEMU jobs require Docker-in-Docker. Additional configuration is necessary. [14]

Docker-in-Docker makes the CI environment less secure [15], it needs more resources and it has known issues. Please migrate your Docker jobs to better alternatives if you can [16].

Add the following to your values.yaml:

runners:
  config: |
    [[runners]]
      [runners.kubernetes]
        image = "ubuntu:20.04"
        privileged = true
      [[runners.kubernetes.volumes.empty_dir]]
        name = "docker-certs"
        mount_path = "/certs/client"
        medium = "Memory"

Update your job definitions to use the following. Alternatively, variables can be set using the runner environment configuration [17].

image: docker:20.10.16
services:
  - docker:20.10.16-dind
variables:
  DOCKER_HOST: tcp://docker:2376
  DOCKER_TLS_CERTDIR: "/certs"
  DOCKER_TLS_VERIFY: 1
  DOCKER_CERT_PATH: "$DOCKER_TLS_CERTDIR/client"
before_script:
  - until docker info; do sleep 1; done

Resource Management

The QEMU pipeline has around 100 jobs. Most of them can run in parallel. Each job needs enough resources to complete before it times out.

To understand Kubernetes resource measure units, see Resource Management for Pods and Containers.

Set requests:

runners:
  config: |
    [[runners]]
      [runners.kubernetes]
        cpu_request = "0.5"      
        service_cpu_request = "0.5"
        helper_cpu_request = "0.25"     

Jobs that have higher requirements should set their own variables. See overwrite container resources. To allow single jobs to request more resources, you have to set the _overwrite_max_allowed variables. See [18].

runners:
  config: |
    [[runners]]
      [runners.kubernetes]
        cpu_request_overwrite_max_allowed = "7"        
        memory_request_overwrite_max_allowed = "30Gi"