Build Your Own GitLab Runners on EKS
As more and more DevOps teams adopt GitLab for internal development, and furthermore use GitLab CI/CD pipelines, finding ways to manage costs while still delivering highly available, scalable pipelines is crucial.
This is Part 1 of a 4-part series on building production-ready GitLab runners on AWS. In this foundational post, I'll show you how to leverage EKS Auto Mode with Terraform and Helm to create a robust, scalable runner infrastructure.
Why GitLab Runners on Kubernetes?¶
At a high level, a GitLab runner is an agent that runs CI jobs in your pipeline. This can be in the form of agents running on Linux, Windows, or Mac hosts, as well as Docker or Kubernetes.
GitLab offers a great selection of SaaS (hosted) runners and a generous amount of free compute hours (10,000 minutes monthly at the time of writing this post), but you would be surprised how quickly this can be consumed by development teams doing rigorous testing, troubleshooting pipelines, etc.
The Kubernetes Advantage¶
The benefit of Kubernetes is the scalability and elasticity you can take advantage of, as well as upgrade ease with Helm. If you have seen my other post on Karpenter and Why You Should Ditch Cluster Autoscaler then you might have an inkling of how you can provide high scalability and flexibility for your jobs using EKS.
Why EKS Auto Mode Changes Everything¶
EKS Auto Mode is a game-changer for GitLab runners because it:
✅ Automatically manages Karpenter - No manual Karpenter installation or configuration ✅ Optimizes node provisioning - Intelligent instance selection and scaling ✅ Reduces operational overhead - AWS handles the complex stuff ✅ Provides enterprise-grade reliability - Built-in best practices ✅ Simplifies cost management - Automatic Spot integration and optimization
Series Overview¶
This is a comprehensive 4-part series that will take you from zero to production-ready GitLab runners:
Part 1: Build Your Own GitLab Runners on EKS Auto Mode (This Post)¶
- EKS Auto Mode setup with Terraform
- GitLab Runner Helm installation and configuration
- Basic runner deployment and registration
- Testing your first pipeline
Part 2: Advanced GitLab Runner Security with IRSA (Coming Next)¶
- Deep dive into IAM Roles for Service Accounts (IRSA)
- Secure AWS permissions without storing credentials
- Multi-environment security strategies
- CI/CD security best practices
Part 3: Optimizing GitLab Runner Performance with S3 Caching (Coming Soon)¶
- S3-backed distributed caching setup
- Cache strategies for different workload types
- Performance benchmarking and optimization
- Cost-effective cache management
Part 4: Smart SPOT vs ON-DEMAND Runner Strategies (Coming Soon)¶
- Advanced node selection strategies
- Runner tagging for workload separation
- Mixed SPOT/ON-DEMAND configurations
- Reliability vs cost optimization
Architecture Overview¶
Here's what we'll build in this first part:
graph TB
A[GitLab.com] --> B[GitLab Runner Pods]
B --> C[EKS Auto Mode Cluster]
C --> D[Karpenter - Managed by AWS]
D --> E[EC2 Spot Instances]
D --> F[EC2 On-Demand Instances]
G[Terraform] --> C
G --> H[Helm Chart]
H --> B
I[Job Pods] --> C
B --> I
Prerequisites¶
Before we dive in, make sure you have:
- AWS CLI configured with appropriate permissions
- Terraform >= 1.5 installed
- kubectl and Helm 3.x installed
- GitLab account with a project for testing
- Basic understanding of Kubernetes and CI/CD concepts
Cost Considerations
EKS Auto Mode clusters have a slightly higher control plane cost ($0.60/hour vs $0.10/hour for standard EKS) but the operational savings and automatic optimizations often make this worthwhile for production workloads.
Setting Up EKS Auto Mode with Terraform¶
Let's start by creating our EKS Auto Mode cluster. This is the foundation that will run our GitLab runners.
Project Structure¶
Core EKS Auto Mode Configuration¶
# eks.tf
resource "aws_eks_cluster" "gitlab_runners" {
name = var.cluster_name
role_arn = aws_iam_role.eks_cluster.arn
version = var.kubernetes_version
# EKS Auto Mode configuration
compute_config {
enabled = true
node_pools = ["system", "general-purpose"]
node_pool_defaults {
ami_type = "AL2023_x86_64_STANDARD"
instance_types = ["m5.large", "m5.xlarge", "m5.2xlarge"]
}
}
vpc_config {
subnet_ids = var.subnet_ids
endpoint_private_access = true
endpoint_public_access = true
public_access_cidrs = var.public_access_cidrs
}
# Enable logging for troubleshooting
enabled_cluster_log_types = [
"api", "audit", "authenticator", "controllerManager", "scheduler"
]
depends_on = [
aws_iam_role_policy_attachment.eks_cluster_policy,
aws_iam_role_policy_attachment.eks_service_policy,
]
tags = merge(var.tags, {
Name = var.cluster_name
Purpose = "GitLab-Runners"
})
}
Required IAM Configuration¶
# iam.tf
resource "aws_iam_role" "eks_cluster" {
name = "${var.cluster_name}-cluster-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "eks.amazonaws.com"
}
}
]
})
}
resource "aws_iam_role_policy_attachment" "eks_cluster_policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
role = aws_iam_role.eks_cluster.name
}
# Auto Mode requires additional permissions
resource "aws_iam_role_policy_attachment" "eks_service_policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSComputePolicy"
role = aws_iam_role.eks_cluster.name
}
Variables Configuration¶
# variables.tf
variable "cluster_name" {
description = "Name of the EKS cluster"
type = string
default = "gitlab-runners-eks"
}
variable "kubernetes_version" {
description = "Kubernetes version"
type = string
default = "1.30"
}
variable "subnet_ids" {
description = "List of subnet IDs for the EKS cluster"
type = list(string)
}
variable "public_access_cidrs" {
description = "List of CIDR blocks for API server access"
type = list(string)
default = ["0.0.0.0/0"] # Restrict this in production!
}
variable "tags" {
description = "Tags to apply to resources"
type = map(string)
default = {}
}
Deploying the Infrastructure¶
# Initialize and apply Terraform
terraform init
terraform plan
terraform apply
# Configure kubectl
aws eks update-kubeconfig --region us-west-2 --name gitlab-runners-eks
# Verify cluster is ready
kubectl get nodes
kubectl get pods -A
Installing GitLab Runner with Helm¶
Now let's install the GitLab Runner using the official Helm chart, optimized for our EKS Auto Mode setup.
Adding the GitLab Helm Repository¶
GitLab Runner Configuration¶
Create a values.yaml
file for our runner configuration:
# gitlab-runner-values.yaml
gitlabUrl: https://gitlab.com/
runnerToken: "YOUR_RUNNER_TOKEN" # Get this from GitLab project settings
concurrent: 10
checkInterval: 30
rbac:
create: true
serviceAccountName: gitlab-runner
runners:
config: |
[[runners]]
name = "GitLab Runner - EKS Auto Mode"
url = "https://gitlab.com/"
token = "YOUR_RUNNER_TOKEN"
executor = "kubernetes"
[runners.kubernetes]
namespace = "gitlab-runner"
image = "ubuntu:22.04"
# Resource requests optimized for Auto Mode
cpu_request = "100m"
memory_request = "128Mi"
cpu_limit = "2000m"
memory_limit = "4Gi"
# Helper image configuration
helper_image = "registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-latest"
# Node selection - let Auto Mode decide
node_selector_overwrite_allowed = ".*"
node_tolerations_overwrite_allowed = ".*"
# Efficient pod cleanup
pod_cleanup_grace_period_seconds = 30
# Enable service account for future IRSA setup
service_account_overwrite_allowed = ".*"
# Resource limits for the runner pod itself
resources:
limits:
memory: 256Mi
cpu: 200m
requests:
memory: 128Mi
cpu: 100m
# Anti-affinity to spread runners across nodes
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- gitlab-runner
topologyKey: kubernetes.io/hostname
Deploy the GitLab Runner¶
# Create namespace
kubectl create namespace gitlab-runner
# Install GitLab Runner
helm install gitlab-runner gitlab/gitlab-runner \
--namespace gitlab-runner \
--values gitlab-runner-values.yaml
# Verify deployment
kubectl get pods -n gitlab-runner
kubectl logs -f deployment/gitlab-runner -n gitlab-runner
Registering Your Runner¶
To get your runner token:
- Go to your GitLab project
- Navigate to Settings → CI/CD
- Expand the Runners section
- Copy the registration token
Update your values.yaml
with the token and upgrade:
helm upgrade gitlab-runner gitlab/gitlab-runner \
--namespace gitlab-runner \
--values gitlab-runner-values.yaml
Testing Your First Pipeline¶
Create a simple .gitlab-ci.yml
to test your runner:
#.gitlab-ci.yml
stages:
- test
- build
variables:
KUBERNETES_CPU_REQUEST: "100m"
KUBERNETES_MEMORY_REQUEST: "128Mi"
test-job:
stage: test
script:
- echo "Hello from EKS Auto Mode GitLab Runner!"
- uname -a
- kubectl version --client
- echo "Node information:"
- cat /proc/version
tags:
- kubernetes
build-job:
stage: build
script:
- echo "Building application..."
- sleep 30 # Simulate build time
- echo "Build complete!"
tags:
- kubernetes
Monitoring and Verification¶
Check Runner Status in GitLab¶
- Go to Settings → CI/CD → Runners
- Your runner should appear with a green dot (online)
- Check the System ID and Last Contact information
Monitor Cluster Resources¶
# Watch nodes scale up/down
kubectl get nodes -w
# Monitor runner pods
kubectl get pods -n gitlab-runner -w
# Check job pods
kubectl get pods --all-namespaces | grep runner-
# View cluster events
kubectl get events --sort-by=.metadata.creationTimestamp
EKS Auto Mode Monitoring¶
EKS Auto Mode provides built-in observability. Check the AWS Console for:
- Node Pool Usage - How Auto Mode is selecting instances
- Scaling Events - When and why nodes are added/removed
- Cost Optimization - Spot vs On-Demand usage
- Resource Utilization - CPU and memory efficiency
What's Working Behind the Scenes¶
With EKS Auto Mode, here's what's automatically handled:
Intelligent Node Provisioning¶
- Instance Selection: Auto Mode chooses optimal instance types based on workload requirements
- Availability Zone Distribution: Automatic spread across AZs for reliability
- Spot Integration: Automatic Spot instance usage when appropriate
- Right-sizing: Nodes sized appropriately for your workloads
Operational Excellence¶
- Karpenter Management: AWS manages Karpenter installation and updates
- Security Patches: Automatic node AMI updates and patching
- Monitoring Integration: Built-in CloudWatch metrics and logging
- Cost Optimization: Continuous optimization of instance selection
Troubleshooting Common Issues¶
Runner Won't Register¶
# Check runner logs
kubectl logs deployment/gitlab-runner -n gitlab-runner
# Verify token and URL
kubectl describe secret gitlab-runner-secret -n gitlab-runner
Jobs Stuck in Pending¶
# Check for resource constraints
kubectl describe pod <job-pod-name>
# Verify node capacity
kubectl describe nodes
Slow Job Startup¶
This is where Auto Mode shines - job pods typically start in under 30 seconds due to intelligent node pre-provisioning.
Cost Optimization Tips¶
Even with this basic setup, you're already benefiting from:
✅ Automatic Spot Usage: Auto Mode uses Spot instances when safe ✅ Right-sized Nodes: No over-provisioning of compute resources ✅ Fast Scale-down: Unused nodes terminated quickly ✅ Efficient Scheduling: Jobs packed efficiently onto existing nodes
What's Next?¶
This foundational setup gives you a robust, scalable GitLab runner infrastructure on EKS Auto Mode. But we're just getting started!
Coming in Part 2: Advanced Security with IRSA¶
In the next post, we'll dive deep into IAM Roles for Service Accounts (IRSA) to give your runners secure, credential-free access to AWS services. You'll learn:
- How IRSA eliminates the need for hardcoded AWS credentials
- Setting up role-based permissions for different pipeline stages
- Multi-environment security strategies
- Best practices for CI/CD security on AWS
Preview: What IRSA Enables¶
# Example from Part 2 - No secrets needed!
deploy-to-s3:
stage: deploy
script:
- aws s3 sync./build s3://my-app-bucket/
# ↑ This works without any AWS credentials in GitLab!
tags:
- kubernetes
Coming in Parts 3 & 4¶
- Part 3: S3-backed distributed caching for lightning-fast builds
- Part 4: Advanced SPOT vs ON-DEMAND strategies for maximum cost savings
🎉 Congratulations!
You now have a production-ready GitLab runner infrastructure running on EKS Auto Mode! Your runners are automatically optimized, secure, and ready to scale with your development team's needs.
Next Steps:
- Test thoroughly with your actual CI/CD pipelines
- Monitor costs in the AWS console to see the Auto Mode benefits
- Stay tuned for Part 2 where we'll add enterprise-grade security with IRSA
- Share feedback - let me know how this setup works for your team!
Ready to level up your GitLab runners with advanced security? Part 2: Advanced GitLab Runner Security with IRSA → (Coming Soon!)