Build Your Own GitLab Runners on EKS

As more and more DevOps teams adopt GitLab for internal development, and furthermore use GitLab CI/CD pipelines, finding ways to manage costs while still delivering highly available, scalable pipelines is crucial.

This is Part 1 of a 4-part series on building production-ready GitLab runners on AWS. In this foundational post, I'll show you how to leverage EKS Auto Mode with Terraform and Helm to create a robust, scalable runner infrastructure.

Why GitLab Runners on Kubernetes?¶

At a high level, a GitLab runner is an agent that runs CI jobs in your pipeline. This can be in the form of agents running on Linux, Windows, or Mac hosts, as well as Docker or Kubernetes.

GitLab offers a great selection of SaaS (hosted) runners and a generous amount of free compute hours (10,000 minutes monthly at the time of writing this post), but you would be surprised how quickly this can be consumed by development teams doing rigorous testing, troubleshooting pipelines, etc.

The Kubernetes Advantage¶

The benefit of Kubernetes is the scalability and elasticity you can take advantage of, as well as upgrade ease with Helm. If you have seen my other post on Karpenter and Why You Should Ditch Cluster Autoscaler then you might have an inkling of how you can provide high scalability and flexibility for your jobs using EKS.

Why EKS Auto Mode Changes Everything¶

EKS Auto Mode is a game-changer for GitLab runners because it:

✅ Automatically manages Karpenter - No manual Karpenter installation or configuration ✅ Optimizes node provisioning - Intelligent instance selection and scaling ✅ Reduces operational overhead - AWS handles the complex stuff ✅ Provides enterprise-grade reliability - Built-in best practices ✅ Simplifies cost management - Automatic Spot integration and optimization

Series Overview¶

This is a comprehensive 4-part series that will take you from zero to production-ready GitLab runners:

Part 1: Build Your Own GitLab Runners on EKS Auto Mode (This Post)¶

EKS Auto Mode setup with Terraform
GitLab Runner Helm installation and configuration
Basic runner deployment and registration
Testing your first pipeline

Part 2: Advanced GitLab Runner Security with IRSA (Coming Next)¶

Deep dive into IAM Roles for Service Accounts (IRSA)
Secure AWS permissions without storing credentials
Multi-environment security strategies
CI/CD security best practices

Part 3: Optimizing GitLab Runner Performance with S3 Caching (Coming Soon)¶

S3-backed distributed caching setup
Cache strategies for different workload types
Performance benchmarking and optimization
Cost-effective cache management

Part 4: Smart SPOT vs ON-DEMAND Runner Strategies (Coming Soon)¶

Advanced node selection strategies
Runner tagging for workload separation
Mixed SPOT/ON-DEMAND configurations
Reliability vs cost optimization

Architecture Overview¶

Here's what we'll build in this first part:

graph TB
 A[GitLab.com] --> B[GitLab Runner Pods]
 B --> C[EKS Auto Mode Cluster]
 C --> D[Karpenter - Managed by AWS]
 D --> E[EC2 Spot Instances]
 D --> F[EC2 On-Demand Instances]

 G[Terraform] --> C
 G --> H[Helm Chart]
 H --> B

 I[Job Pods] --> C
 B --> I

Prerequisites¶

Before we dive in, make sure you have:

AWS CLI configured with appropriate permissions
Terraform >= 1.5 installed
kubectl and Helm 3.x installed
GitLab account with a project for testing
Basic understanding of Kubernetes and CI/CD concepts

Cost Considerations

EKS Auto Mode clusters have a slightly higher control plane cost ($0.60/hour vs $0.10/hour for standard EKS) but the operational savings and automatic optimizations often make this worthwhile for production workloads.

Setting Up EKS Auto Mode with Terraform¶

Let's start by creating our EKS Auto Mode cluster. This is the foundation that will run our GitLab runners.

Project Structure¶

gitlab-runners-eks/
├── main.tf
├── variables.tf
├── outputs.tf
├── eks.tf
├── iam.tf
└── helm.tf

Core EKS Auto Mode Configuration¶

# eks.tf
resource "aws_eks_cluster" "gitlab_runners" {
 name = var.cluster_name
 role_arn = aws_iam_role.eks_cluster.arn
 version = var.kubernetes_version

 # EKS Auto Mode configuration
 compute_config {
 enabled = true
 node_pools = ["system", "general-purpose"]
 node_pool_defaults {
 ami_type = "AL2023_x86_64_STANDARD"
 instance_types = ["m5.large", "m5.xlarge", "m5.2xlarge"]
 }
 }

 vpc_config {
 subnet_ids = var.subnet_ids
 endpoint_private_access = true
 endpoint_public_access = true
 public_access_cidrs = var.public_access_cidrs
 }

 # Enable logging for troubleshooting
 enabled_cluster_log_types = [
 "api", "audit", "authenticator", "controllerManager", "scheduler"
 ]

 depends_on = [
 aws_iam_role_policy_attachment.eks_cluster_policy,
 aws_iam_role_policy_attachment.eks_service_policy,
 ]

 tags = merge(var.tags, {
 Name = var.cluster_name
 Purpose = "GitLab-Runners"
 })
}

Required IAM Configuration¶

# iam.tf
resource "aws_iam_role" "eks_cluster" {
 name = "${var.cluster_name}-cluster-role"

 assume_role_policy = jsonencode({
 Version = "2012-10-17"
 Statement = [
 {
 Action = "sts:AssumeRole"
 Effect = "Allow"
 Principal = {
 Service = "eks.amazonaws.com"
 }
 }
 ]
 })
}

resource "aws_iam_role_policy_attachment" "eks_cluster_policy" {
 policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
 role = aws_iam_role.eks_cluster.name
}

# Auto Mode requires additional permissions
resource "aws_iam_role_policy_attachment" "eks_service_policy" {
 policy_arn = "arn:aws:iam::aws:policy/AmazonEKSComputePolicy"
 role = aws_iam_role.eks_cluster.name
}

Variables Configuration¶

# variables.tf
variable "cluster_name" {
 description = "Name of the EKS cluster"
 type = string
 default = "gitlab-runners-eks"
}

variable "kubernetes_version" {
 description = "Kubernetes version"
 type = string
 default = "1.30"
}

variable "subnet_ids" {
 description = "List of subnet IDs for the EKS cluster"
 type = list(string)
}

variable "public_access_cidrs" {
 description = "List of CIDR blocks for API server access"
 type = list(string)
 default = ["0.0.0.0/0"] # Restrict this in production!
}

variable "tags" {
 description = "Tags to apply to resources"
 type = map(string)
 default = {}
}

Deploying the Infrastructure¶

# Initialize and apply Terraform
terraform init
terraform plan
terraform apply

# Configure kubectl
aws eks update-kubeconfig --region us-west-2 --name gitlab-runners-eks

# Verify cluster is ready
kubectl get nodes
kubectl get pods -A

Installing GitLab Runner with Helm¶

Now let's install the GitLab Runner using the official Helm chart, optimized for our EKS Auto Mode setup.

Adding the GitLab Helm Repository¶

helm repo add gitlab https://charts.gitlab.io
helm repo update

GitLab Runner Configuration¶

Create a values.yaml file for our runner configuration:

# gitlab-runner-values.yaml
gitlabUrl: https://gitlab.com/
runnerToken: "YOUR_RUNNER_TOKEN" # Get this from GitLab project settings

concurrent: 10
checkInterval: 30

rbac:
 create: true
 serviceAccountName: gitlab-runner

runners:
 config: |
 [[runners]]
 name = "GitLab Runner - EKS Auto Mode"
 url = "https://gitlab.com/"
 token = "YOUR_RUNNER_TOKEN"
 executor = "kubernetes"

 [runners.kubernetes]
 namespace = "gitlab-runner"
 image = "ubuntu:22.04"

 # Resource requests optimized for Auto Mode
 cpu_request = "100m"
 memory_request = "128Mi"
 cpu_limit = "2000m"
 memory_limit = "4Gi"

 # Helper image configuration
 helper_image = "registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-latest"

 # Node selection - let Auto Mode decide
 node_selector_overwrite_allowed = ".*"
 node_tolerations_overwrite_allowed = ".*"

 # Efficient pod cleanup
 pod_cleanup_grace_period_seconds = 30

 # Enable service account for future IRSA setup
 service_account_overwrite_allowed = ".*"

# Resource limits for the runner pod itself
resources:
 limits:
 memory: 256Mi
 cpu: 200m
 requests:
 memory: 128Mi
 cpu: 100m

# Anti-affinity to spread runners across nodes
affinity:
 podAntiAffinity:
 preferredDuringSchedulingIgnoredDuringExecution:
 - weight: 100
 podAffinityTerm:
 labelSelector:
 matchExpressions:
 - key: app
 operator: In
 values:
 - gitlab-runner
 topologyKey: kubernetes.io/hostname

Deploy the GitLab Runner¶

# Create namespace
kubectl create namespace gitlab-runner

# Install GitLab Runner
helm install gitlab-runner gitlab/gitlab-runner \
 --namespace gitlab-runner \
 --values gitlab-runner-values.yaml

# Verify deployment
kubectl get pods -n gitlab-runner
kubectl logs -f deployment/gitlab-runner -n gitlab-runner

Registering Your Runner¶

To get your runner token:

Go to your GitLab project
Navigate to Settings → CI/CD
Expand the Runners section
Copy the registration token

Update your values.yaml with the token and upgrade:

helm upgrade gitlab-runner gitlab/gitlab-runner \
 --namespace gitlab-runner \
 --values gitlab-runner-values.yaml

Testing Your First Pipeline¶

Create a simple .gitlab-ci.yml to test your runner:

#.gitlab-ci.yml
stages:
 - test
 - build

variables:
 KUBERNETES_CPU_REQUEST: "100m"
 KUBERNETES_MEMORY_REQUEST: "128Mi"

test-job:
 stage: test
 script:
 - echo "Hello from EKS Auto Mode GitLab Runner!"
 - uname -a
 - kubectl version --client
 - echo "Node information:"
 - cat /proc/version
 tags:
 - kubernetes

build-job:
 stage: build
 script:
 - echo "Building application..."
 - sleep 30 # Simulate build time
 - echo "Build complete!"
 tags:
 - kubernetes

Monitoring and Verification¶

Check Runner Status in GitLab¶

Go to Settings → CI/CD → Runners
Your runner should appear with a green dot (online)
Check the System ID and Last Contact information

Monitor Cluster Resources¶

# Watch nodes scale up/down
kubectl get nodes -w

# Monitor runner pods
kubectl get pods -n gitlab-runner -w

# Check job pods
kubectl get pods --all-namespaces | grep runner-

# View cluster events
kubectl get events --sort-by=.metadata.creationTimestamp

EKS Auto Mode Monitoring¶

EKS Auto Mode provides built-in observability. Check the AWS Console for:

Node Pool Usage - How Auto Mode is selecting instances
Scaling Events - When and why nodes are added/removed
Cost Optimization - Spot vs On-Demand usage
Resource Utilization - CPU and memory efficiency

What's Working Behind the Scenes¶

With EKS Auto Mode, here's what's automatically handled:

Intelligent Node Provisioning¶

Instance Selection: Auto Mode chooses optimal instance types based on workload requirements
Availability Zone Distribution: Automatic spread across AZs for reliability
Spot Integration: Automatic Spot instance usage when appropriate
Right-sizing: Nodes sized appropriately for your workloads

Operational Excellence¶

Karpenter Management: AWS manages Karpenter installation and updates
Security Patches: Automatic node AMI updates and patching
Monitoring Integration: Built-in CloudWatch metrics and logging
Cost Optimization: Continuous optimization of instance selection

Troubleshooting Common Issues¶

Runner Won't Register¶

# Check runner logs
kubectl logs deployment/gitlab-runner -n gitlab-runner

# Verify token and URL
kubectl describe secret gitlab-runner-secret -n gitlab-runner

Jobs Stuck in Pending¶

# Check for resource constraints
kubectl describe pod <job-pod-name>

# Verify node capacity
kubectl describe nodes

Slow Job Startup¶

This is where Auto Mode shines - job pods typically start in under 30 seconds due to intelligent node pre-provisioning.

Cost Optimization Tips¶

Even with this basic setup, you're already benefiting from:

✅ Automatic Spot Usage: Auto Mode uses Spot instances when safe ✅ Right-sized Nodes: No over-provisioning of compute resources ✅ Fast Scale-down: Unused nodes terminated quickly ✅ Efficient Scheduling: Jobs packed efficiently onto existing nodes

What's Next?¶

This foundational setup gives you a robust, scalable GitLab runner infrastructure on EKS Auto Mode. But we're just getting started!

Coming in Part 2: Advanced Security with IRSA¶

In the next post, we'll dive deep into IAM Roles for Service Accounts (IRSA) to give your runners secure, credential-free access to AWS services. You'll learn:

How IRSA eliminates the need for hardcoded AWS credentials
Setting up role-based permissions for different pipeline stages
Multi-environment security strategies
Best practices for CI/CD security on AWS

Preview: What IRSA Enables¶

# Example from Part 2 - No secrets needed!
deploy-to-s3:
 stage: deploy
 script:
 - aws s3 sync./build s3://my-app-bucket/
 # ↑ This works without any AWS credentials in GitLab!
 tags:
 - kubernetes

Coming in Parts 3 & 4¶

Part 3: S3-backed distributed caching for lightning-fast builds
Part 4: Advanced SPOT vs ON-DEMAND strategies for maximum cost savings

🎉 Congratulations!

You now have a production-ready GitLab runner infrastructure running on EKS Auto Mode! Your runners are automatically optimized, secure, and ready to scale with your development team's needs.

Next Steps:

Test thoroughly with your actual CI/CD pipelines
Monitor costs in the AWS console to see the Auto Mode benefits
Stay tuned for Part 2 where we'll add enterprise-grade security with IRSA
Share feedback - let me know how this setup works for your team!

Ready to level up your GitLab runners with advanced security? Part 2: Advanced GitLab Runner Security with IRSA → (Coming Soon!)