Karpenter and Why You Should Ditch Cluster Autoscaler

If you have used Amazon Elastic Kubernetes Service (EKS) you may have experienced that node groups' autoscaling leaves a bit to be desired. It can be slow, clunky to configure, and making use of mixed instance types can be difficult. My biggest complaint? Nodes launch SLOW!

Enter Karpenter. Karpenter is a node provisioner - which now uses the NodePool terminology, similar to GKE and others - that utilizes the EC2 Fleet API to schedule nodes directly to the cluster, without having to be in a managed node group!

Why would you want to do this? How about nodes that launch and reach Ready state in k8s in 10 seconds!

Read on to see how you can scale faster, more reliably, with greater flexibility, and save money by using Karpenter.

The Problem with Traditional Cluster Autoscaler¶

Before we dive into Karpenter's magic, let's talk about why the traditional approach falls short:

Slow Node Provisioning¶

Traditional EKS managed node groups can take 2-5 minutes to launch new nodes. In the world of modern application scaling, that's an eternity. Your pods sit in Pending state while users experience degraded performance.

Limited Instance Type Flexibility¶

Node groups are tied to specific instance types or limited instance families. Want to mix c5.large and m5.xlarge in the same scaling group? Good luck with that complexity.

Complex Configuration Management¶

Managing multiple node groups for different workload types means: - Multiple Auto Scaling Groups to manage - Complex tagging and labeling strategies - Difficulty optimizing costs across instance types - Manual intervention for scaling policies

Poor Spot Instance Integration¶

While you can use Spot instances with managed node groups, the configuration is cumbersome and doesn't automatically optimize for the best available instances.

Karpenter: The Game Changer¶

Karpenter approaches node provisioning fundamentally differently:

Direct EC2 Fleet API Integration¶

Instead of going through Auto Scaling Groups, Karpenter talks directly to the EC2 Fleet API. This eliminates several layers of abstraction and dramatically improves provisioning speed.

Intelligent Instance Selection¶

Karpenter automatically selects the best available instance types based on: - Current AWS pricing - Spot instance availability - Your workload requirements - Resource constraints

Dynamic NodePool Creation¶

Rather than pre-defining static node groups, Karpenter creates and destroys resources dynamically based on actual demand.

What Makes This Guide Different¶

Most Karpenter tutorials show you basic YAML configurations and call it a day. But here's the thing - nobody talks about automating NodePool and NodeClass configuration at scale.

In this guide, I'll show you my Terraform + Helm solution that:

✅ Automatically configures NodePools and NodeClasses ✅ Works seamlessly with EKS Auto Mode clusters ✅ Eliminates manual YAML management ✅ Provides Infrastructure as Code for your entire scaling strategy ✅ Scales to multiple clusters and environments

This approach is particularly valuable for EKS Auto Mode, where you don't need to self-manage Karpenter but still want fine-grained control over node provisioning behavior.

Architecture Overview¶

Here's how our automated solution works:

graph TB
 A[Terraform] --> B[EKS Auto Mode Cluster]
 A --> C[Helm Chart for NodePools]
 A --> D[IAM Roles & Policies]

 C --> E[Automated NodePool Creation]
 C --> F[Automated NodeClass Creation]

 E --> G[Workload-Specific Pools]
 F --> H[Instance Type Optimization]

 G --> I[Production Workloads]
 G --> J[Development/Testing]
 G --> K[Batch/Spot Workloads]

Prerequisites¶

Before we begin, ensure you have:

AWS CLI configured with appropriate permissions
Terraform >= 1.5 installed
kubectl configured for your cluster
Helm 3.x installed
An existing EKS cluster (Auto Mode or traditional)

EKS Auto Mode Compatibility

This solution is optimized for EKS Auto Mode clusters but works perfectly with traditional EKS clusters where you manage Karpenter yourself.

The Terraform + Helm Architecture¶

Our solution consists of three main components:

1. Terraform Infrastructure Module¶

Manages the underlying AWS resources, IAM permissions, and cluster configuration.

2. Dynamic Helm Chart¶

Automatically generates NodePool and NodeClass configurations based on your requirements.

3. Configuration-Driven Approach¶

Uses Terraform variables and locals to drive NodePool creation, making it repeatable across environments.

Coming Up Next¶

In the following sections, we'll build this solution step by step:

Setting Up the Terraform Foundation - IAM roles, policies, and base configuration
Creating the Dynamic Helm Chart - Our secret sauce for automated NodePool generation
NodePool Configuration Strategies - Different approaches for different workload types
EKS Auto Mode Integration - Specific considerations for Auto Mode clusters
Cost Optimization Techniques - Maximizing Spot usage and instance selection
Monitoring and Observability - Tracking performance and costs
Troubleshooting Guide - Common issues and solutions

Let's start building something awesome! 🚀

Terraform Foundation¶

First, let's establish our Terraform foundation. This module will handle all the AWS-side configuration needed for our automated Karpenter setup.

Directory Structure¶

karpenter-terraform/
├── main.tf
├── variables.tf
├── outputs.tf
├── iam.tf
├── helm.tf
└── charts/
 └── karpenter-nodepools/
 ├── Chart.yaml
 ├── values.yaml
 └── templates/
 ├── nodepool.yaml
 └── nodeclass.yaml

Core Terraform Configuration¶

# main.tf
terraform {
 required_providers {
 aws = {
 source = "hashicorp/aws"
 version = "~> 5.0"
 }
 helm = {
 source = "hashicorp/helm"
 version = "~> 2.12"
 }
 }
}

# Data sources for existing EKS cluster
data "aws_eks_cluster" "cluster" {
 name = var.cluster_name
}

data "aws_eks_cluster_auth" "cluster" {
 name = var.cluster_name
}

# Configure Helm provider
provider "helm" {
 kubernetes {
 host = data.aws_eks_cluster.cluster.endpoint
 cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
 token = data.aws_eks_cluster_auth.cluster.token
 }
}

This foundation sets us up for the automated NodePool creation we'll implement next...

What's Next?

In the next section, I'll show you the Dynamic Helm Chart that automatically generates NodePool configurations based on your Terraform variables. This is where the magic happens!

Continue reading → or let me know if you'd like to dive deeper into any specific aspect of this architecture.