Skip to content

Karpenter and Why You Should Ditch Cluster Autoscaler

If you have used Amazon Elastic Kubernetes Service (EKS) you may have experienced that node groups' autoscaling leaves a bit to be desired. It can be slow, clunky to configure, and making use of mixed instance types can be difficult. My biggest complaint? Nodes launch SLOW!

Enter Karpenter. Karpenter is a node provisioner - which now uses the NodePool terminology, similar to GKE and others - that utilizes the EC2 Fleet API to schedule nodes directly to the cluster, without having to be in a managed node group!

Why would you want to do this? How about nodes that launch and reach Ready state in k8s in 10 seconds!

Read on to see how you can scale faster, more reliably, with greater flexibility, and save money by using Karpenter.

The Problem with Traditional Cluster Autoscaler

Before we dive into Karpenter's magic, let's talk about why the traditional approach falls short:

Slow Node Provisioning

Traditional EKS managed node groups can take 2-5 minutes to launch new nodes. In the world of modern application scaling, that's an eternity. Your pods sit in Pending state while users experience degraded performance.

Limited Instance Type Flexibility

Node groups are tied to specific instance types or limited instance families. Want to mix c5.large and m5.xlarge in the same scaling group? Good luck with that complexity.

Complex Configuration Management

Managing multiple node groups for different workload types means: - Multiple Auto Scaling Groups to manage - Complex tagging and labeling strategies - Difficulty optimizing costs across instance types - Manual intervention for scaling policies

Poor Spot Instance Integration

While you can use Spot instances with managed node groups, the configuration is cumbersome and doesn't automatically optimize for the best available instances.

Karpenter: The Game Changer

Karpenter approaches node provisioning fundamentally differently:

Direct EC2 Fleet API Integration

Instead of going through Auto Scaling Groups, Karpenter talks directly to the EC2 Fleet API. This eliminates several layers of abstraction and dramatically improves provisioning speed.

Intelligent Instance Selection

Karpenter automatically selects the best available instance types based on: - Current AWS pricing - Spot instance availability - Your workload requirements - Resource constraints

Dynamic NodePool Creation

Rather than pre-defining static node groups, Karpenter creates and destroys resources dynamically based on actual demand.

What Makes This Guide Different

Most Karpenter tutorials show you basic YAML configurations and call it a day. But here's the thing - nobody talks about automating NodePool and NodeClass configuration at scale.

In this guide, I'll show you my Terraform + Helm solution that:

Automatically configures NodePools and NodeClassesWorks seamlessly with EKS Auto Mode clustersEliminates manual YAML managementProvides Infrastructure as Code for your entire scaling strategyScales to multiple clusters and environments

This approach is particularly valuable for EKS Auto Mode, where you don't need to self-manage Karpenter but still want fine-grained control over node provisioning behavior.

Architecture Overview

Here's how our automated solution works:

graph TB
 A[Terraform] --> B[EKS Auto Mode Cluster]
 A --> C[Helm Chart for NodePools]
 A --> D[IAM Roles & Policies]

 C --> E[Automated NodePool Creation]
 C --> F[Automated NodeClass Creation]

 E --> G[Workload-Specific Pools]
 F --> H[Instance Type Optimization]

 G --> I[Production Workloads]
 G --> J[Development/Testing]
 G --> K[Batch/Spot Workloads]

Prerequisites

Before we begin, ensure you have:

  • AWS CLI configured with appropriate permissions
  • Terraform >= 1.5 installed
  • kubectl configured for your cluster
  • Helm 3.x installed
  • An existing EKS cluster (Auto Mode or traditional)

EKS Auto Mode Compatibility

This solution is optimized for EKS Auto Mode clusters but works perfectly with traditional EKS clusters where you manage Karpenter yourself.

The Terraform + Helm Architecture

Our solution consists of three main components:

1. Terraform Infrastructure Module

Manages the underlying AWS resources, IAM permissions, and cluster configuration.

2. Dynamic Helm Chart

Automatically generates NodePool and NodeClass configurations based on your requirements.

3. Configuration-Driven Approach

Uses Terraform variables and locals to drive NodePool creation, making it repeatable across environments.

Coming Up Next

In the following sections, we'll build this solution step by step:

  1. Setting Up the Terraform Foundation - IAM roles, policies, and base configuration
  2. Creating the Dynamic Helm Chart - Our secret sauce for automated NodePool generation
  3. NodePool Configuration Strategies - Different approaches for different workload types
  4. EKS Auto Mode Integration - Specific considerations for Auto Mode clusters
  5. Cost Optimization Techniques - Maximizing Spot usage and instance selection
  6. Monitoring and Observability - Tracking performance and costs
  7. Troubleshooting Guide - Common issues and solutions

Let's start building something awesome! 🚀


Terraform Foundation

First, let's establish our Terraform foundation. This module will handle all the AWS-side configuration needed for our automated Karpenter setup.

Directory Structure

karpenter-terraform/
├── main.tf
├── variables.tf
├── outputs.tf
├── iam.tf
├── helm.tf
└── charts/
 └── karpenter-nodepools/
 ├── Chart.yaml
 ├── values.yaml
 └── templates/
 ├── nodepool.yaml
 └── nodeclass.yaml

Core Terraform Configuration

# main.tf
terraform {
 required_providers {
 aws = {
 source = "hashicorp/aws"
 version = "~> 5.0"
 }
 helm = {
 source = "hashicorp/helm"
 version = "~> 2.12"
 }
 }
}

# Data sources for existing EKS cluster
data "aws_eks_cluster" "cluster" {
 name = var.cluster_name
}

data "aws_eks_cluster_auth" "cluster" {
 name = var.cluster_name
}

# Configure Helm provider
provider "helm" {
 kubernetes {
 host = data.aws_eks_cluster.cluster.endpoint
 cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
 token = data.aws_eks_cluster_auth.cluster.token
 }
}

This foundation sets us up for the automated NodePool creation we'll implement next...


What's Next?

In the next section, I'll show you the Dynamic Helm Chart that automatically generates NodePool configurations based on your Terraform variables. This is where the magic happens!

Continue reading → or let me know if you'd like to dive deeper into any specific aspect of this architecture.