KarpenterKubernetes节点自动扩缩技能Skill karpenter

Karpenter是一个专为Kubernetes设计的节点自动扩缩和成本优化工具,用于智能节点选择、spot实例管理和节点合并,帮助降低云计算成本、提高资源利用率。关键词:Kubernetes, 节点自动扩缩, 成本优化, 云原生, DevOps, 云计算, 自动扩缩工具, Karpenter

Docker/K8s 0 次安装 0 次浏览 更新于 3/24/2026

name: karpenter description: Kubernetes节点自动扩缩和成本优化与Karpenter。使用于实现节点供应、spot实例管理、集群大小调整、节点合并或降低计算成本。涵盖NodePool配置、EC2NodeClass设置、中断预算、spot/on-demand混合策略、多架构支持、和容量类型选择。 triggers:

  • karpenter
  • 节点自动扩缩
  • nodepool
  • ec2nodeclass
  • provisioner
  • spot实例
  • on-demand实例
  • 节点合并
  • 节点终止
  • 集群自动扩缩
  • 大小调整
  • 容量类型
  • 节点中断
  • 计算成本
  • 实例选择
  • graviton
  • arm64 allowed-tools: Read, Grep, Glob, Edit, Write, Bash

Karpenter

概述

Karpenter是一个Kubernetes节点自动扩缩器,根据变化的应用程序负载提供适当大小的计算资源。与Cluster Autoscaler不同,后者扩缩预定义的节点组,Karpenter基于聚合的pod资源需求来供应节点,实现更好的装箱和成本优化。

与Cluster Autoscaler的关键区别

  • 直接供应:直接与云提供商API对话(不需要节点组)
  • 快速扩缩:在几秒内供应节点 vs 几分钟
  • 灵活的实例选择:自动从所有可用实例类型中选择
  • 合并:主动用更便宜的替代品替换节点
  • Spot实例优化:一流支持,带有自动回退

何时使用Karpenter

  • 运行具有多样资源需求的工作负载
  • 需要快速扩缩(亚分钟响应)
  • 使用spot实例和Graviton(ARM64)进行成本优化
  • 合并以减少集群浪费和过度供应
  • 具有不可预测或突发工作负载的集群
  • 根据实际使用模式调整基础设施大小
  • 自动管理混合容量类型(spot/on-demand)

指令

1. 安装和设置

  • 在集群中安装Karpenter控制器
  • 配置云提供商凭据(IAM角色)
  • 设置实例配置文件和安群组
  • 为不同类型的工作负载创建NodePools
  • 定义EC2NodeClass(AWS)或等效于您的提供商

2. 设计NodePool策略

  • 为不同工作负载类分离NodePools
  • 定义实例类型家族和大小
  • 配置spot/on-demand混合
  • 设置每个NodePool的资源限制
  • 计划多AZ分布

3. 配置中断管理

  • 设置中断预算以控制变动
  • 配置合并策略
  • 定义节点生命周期的过期窗口
  • 处理工作负载特定的中断约束
  • 测试中断场景

4. 优化成本和性能

  • 启用合并以节省成本
  • 使用带有回退策略的spot实例
  • 在pod上设置适当的资源请求(Karpenter依赖于准确的请求)
  • 监控节点利用率和浪费
  • 根据使用情况调整实例类型限制
  • 利用Graviton(ARM64)实例以降低20%成本
  • 配置容量类型权重以优先选择spot而非on-demand

5. 成本优化策略

  • Spot实例:为容错工作负载配置70-90% spot混合
  • Graviton(ARM64):使用c7g, m7g, r7g家族以降低成本
  • 合并:启用WhenUnderutilized策略以替换昂贵的节点
  • 实例多样性:广泛的实例家族选择提高spot可用性
  • 大小调整:让Karpenter高效装箱而不是过度供应

6. Spot实例管理

  • 使用广泛的实例类型选择(10+家族)以获得更好的spot可用性
  • 配置当spot不可用时自动回退到on-demand
  • 实现Pod中断预算以控制影响范围
  • 在应用程序中设置优雅终止处理程序(preStop钩子)
  • 监控spot中断率并调整实例选择
  • 使用不同的可用区以减少相关故障

7. 节点合并

  • WhenUnderutilized:主动用更便宜/更小的替代品替换节点
  • WhenEmpty:仅合并完全空的节点(保守)
  • 配置consolidateAfter延迟以防止变动(典型30s-600s)
  • 使用中断预算来限制合并率(每个窗口5-20%)
  • 在合并过程中尊重Pod中断预算
  • 设置过期窗口以强制定期节点刷新

最佳实践

  1. 从保守开始:以限制性实例类型开始,根据观察扩展
  2. 使用中断预算:防止太多节点同时被中断
  3. 设置Pod资源请求:Karpenter依赖准确的请求进行调度
  4. 启用合并:让Karpenter自动优化节点利用率
  5. 分离工作负载类:为不同需求使用多个NodePools
  6. 监控供应:跟踪供应延迟和失败
  7. 测试Spot中断:确保优雅处理spot实例终止
  8. 使用拓扑扩散:结合pod拓扑约束以提高可用性

示例

示例1:具有多实例类型的基本NodePool

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      nodeClassRef:
        name: default
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values:
            ["c6a", "c6i", "c7i", "m6a", "m6i", "m7i", "r6a", "r6i", "r7i"]
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ["large", "xlarge", "2xlarge", "4xlarge"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: topology.kubernetes.io/zone
          operator: In
          values: ["us-west-2a", "us-west-2b", "us-west-2c"]
      kubelet:
        maxPods: 110
        systemReserved:
          cpu: 100m
          memory: 100Mi
          ephemeral-storage: 1Gi
        evictionHard:
          memory.available: 5%
          nodefs.available: 10%
        imageGCHighThresholdPercent: 85
        imageGCLowThresholdPercent: 80
      taints:
        - key: workload-type
          value: general
          effect: NoSchedule
      metadata:
        labels:
          workload-type: general
          managed-by: karpenter
  limits:
    cpu: 1000
    memory: 1000Gi
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s
    budgets:
      - nodes: 10%
        duration: 5m
  weight: 10

示例2:EC2NodeClass用于AWS特定配置

apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2
  role: KarpenterNodeRole-my-cluster
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
        kubernetes.io/role/internal-elb: "1"
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
    - name: my-cluster-node-security-group
  userData: |
    #!/bin/bash
    echo "Custom node initialization"
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        iops: 3000
        throughput: 125
        encrypted: true
        deleteOnTermination: true
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 2
    httpTokens: required
  detailedMonitoring: true
  tags:
    Name: karpenter-node
    Environment: production
    ManagedBy: karpenter
    ClusterName: my-cluster

示例3:针对不同工作负载的专门NodePools

---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: gpu-workloads
spec:
  template:
    spec:
      nodeClassRef:
        name: gpu-nodes
      requirements:
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: ["g5", "g6", "p4", "p5"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        - key: karpenter.k8s.aws/instance-gpu-count
          operator: Gt
          values: ["0"]
      taints:
        - key: nvidia.com/gpu
          value: "true"
          effect: NoSchedule
      metadata:
        labels:
          workload-type: gpu
          nvidia.com/gpu: "true"
  limits:
    cpu: 500
    memory: 2000Gi
    nvidia.com/gpu: 16
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 300s

---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: batch-workloads
spec:
  template:
    spec:
      nodeClassRef:
        name: default
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot"]
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: ["c6a", "c6i", "c7i", "m6a", "m6i"]
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ["2xlarge", "4xlarge", "8xlarge"]
      taints:
        - key: workload-type
          value: batch
          effect: NoSchedule
      metadata:
        labels:
          workload-type: batch
          spot-interruption-handler: enabled
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 60s
    budgets:
      - nodes: 20%

---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: stateful-workloads
spec:
  template:
    spec:
      nodeClassRef:
        name: stateful-nodes
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: ["r6i", "r7i"]
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ["xlarge", "2xlarge", "4xlarge"]
        - key: topology.kubernetes.io/zone
          operator: In
          values: ["us-west-2a", "us-west-2b"]
      kubelet:
        maxPods: 50
      taints:
        - key: workload-type
          value: stateful
          effect: NoSchedule
      metadata:
        labels:
          workload-type: stateful
          storage-optimized: "true"
  limits:
    cpu: 200
    memory: 800Gi
  disruption:
    consolidationPolicy: WhenEmpty
    consolidateAfter: 600s
    budgets:
      - nodes: 1
        duration: 30m

示例4:中断预算和合并策略

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: production-apps
spec:
  template:
    spec:
      nodeClassRef:
        name: default
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: ["c6i", "m6i", "r6i"]
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s
    expireAfter: 720h
    budgets:
      - nodes: 5%
        duration: 8h
        schedule: "0 8 * * MON-FRI"
      - nodes: 20%
        duration: 16h
        schedule: "0 18 * * MON-FRI"
      - nodes: 30%
        duration: 48h
        schedule: "0 0 * * SAT"
      - nodes: 10%

示例5:Pod调度与Karpenter

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-application
spec:
  replicas: 5
  selector:
    matchLabels:
      app: my-application
  template:
    metadata:
      labels:
        app: my-application
    spec:
      tolerations:
        - key: workload-type
          operator: Equal
          value: general
          effect: NoSchedule
      nodeSelector:
        workload-type: general
        karpenter.sh/capacity-type: spot
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    app: my-application
                topologyKey: topology.kubernetes.io/zone
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 50
              preference:
                matchExpressions:
                  - key: kubernetes.io/arch
                    operator: In
                    values: ["arm64"]
            - weight: 30
              preference:
                matchExpressions:
                  - key: karpenter.k8s.aws/instance-size
                    operator: In
                    values: ["2xlarge", "4xlarge"]
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app: my-application
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app: my-application
      containers:
        - name: app
          image: my-app:latest
          resources:
            requests:
              cpu: 500m
              memory: 1Gi
            limits:
              cpu: 1000m
              memory: 2Gi
          lifecycle:
            preStop:
              exec:
                command:
                  - /bin/sh
                  - -c
                  - sleep 15
      terminationGracePeriodSeconds: 30

示例6:Spot实例处理和回退

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: spot-with-fallback
spec:
  template:
    spec:
      nodeClassRef:
        name: default
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values:
            - "c5a"
            - "c6a"
            - "c6i"
            - "c7i"
            - "m5a"
            - "m6a"
            - "m6i"
            - "m7i"
            - "r5a"
            - "r6a"
            - "r6i"
            - "r7i"
        - key: karpenter.k8s.aws/instance-size
          operator: In
          values: ["large", "xlarge", "2xlarge", "4xlarge"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]
      metadata:
        labels:
          spot-enabled: "true"
        annotations:
          karpenter.sh/spot-to-spot-consolidation: "true"
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s
    budgets:
      - nodes: 25%
  weight: 5

示例7:Karpenter与Pod中断预算

apiVersion: apps/v1
kind: Deployment
metadata:
  name: critical-service
spec:
  replicas: 6
  selector:
    matchLabels:
      app: critical-service
  template:
    metadata:
      labels:
        app: critical-service
    spec:
      tolerations:
        - key: workload-type
          operator: Equal
          value: general
          effect: NoSchedule
      containers:
        - name: app
          image: critical-service:latest
          resources:
            requests:
              cpu: 1000m
              memory: 2Gi
            limits:
              cpu: 2000m
              memory: 4Gi
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: critical-service-pdb
spec:
  minAvailable: 4
  selector:
    matchLabels:
      app: critical-service

示例8:多架构NodePool

apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: multi-arch
spec:
  template:
    spec:
      nodeClassRef:
        name: default
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values:
            - "c6g"
            - "m6g"
            - "r6g"
            - "c7g"
            - "m7g"
            - "r7g"
            - "c6i"
            - "m6i"
            - "r6i"
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
      metadata:
        labels:
          multi-arch: "true"
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 60s
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2
  role: KarpenterNodeRole-my-cluster
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster

监控和故障排除

关键指标监控

karpenter_nodes_created_total
karpenter_nodes_terminated_total
karpenter_provisioner_scheduling_duration_seconds
karpenter_disruption_replacement_node_initialized_seconds
karpenter_disruption_consolidation_actions_performed_total
karpenter_disruption_budgets_allowed_disruptions
karpenter_provisioner_instance_type_price_estimate
karpenter_cloudprovider_instance_type_offering_price_estimate
karpenter_pods_state

常见问题和解决方案

问题:Pod卡在Pending状态

  • 检查NodePool需求是否匹配pod节点选择器/容忍
  • 验证云提供商限制是否未超出
  • 检查所选区域中实例类型的可用性
  • 确保子网容量可用

问题:过度节点变动

  • 调整合并延迟(consolidateAfter)
  • 审查中断预算
  • 检查pod资源请求是否准确
  • 考虑使用WhenEmpty而非WhenUnderutilized

问题:尽管使用Karpenter成本仍高

  • 如果未激活,则启用合并
  • 验证是否正在使用spot实例
  • 检查pod是否具有不必要的大资源请求
  • 审查实例类型选择(允许更多种类)

问题:Spot中断导致服务中断

  • 实现Pod中断预算
  • 使用多样实例类型以提高spot可用性
  • 配置适当的副本计数
  • 在应用程序中实现优雅关机

与Terraform集成

resource "helm_release" "karpenter" {
  namespace        = "karpenter"
  create_namespace = true
  name             = "karpenter"
  repository       = "oci://public.ecr.aws/karpenter"
  chart            = "karpenter"
  version          = "v0.33.0"
  values = [
    <<-EOT
    settings:
      clusterName: ${var.cluster_name}
      clusterEndpoint: ${var.cluster_endpoint}
      interruptionQueue: ${var.interruption_queue_name}
    serviceAccount:
      annotations:
        eks.amazonaws.com/role-arn: ${var.karpenter_irsa_arn}
    controller:
      resources:
        requests:
          cpu: 1
          memory: 1Gi
        limits:
          cpu: 2
          memory: 2Gi
    EOT
  ]
  depends_on = [
    aws_iam_role_policy_attachment.karpenter_controller
  ]
}

resource "kubectl_manifest" "karpenter_nodepool_default" {
  yaml_body = <<-YAML
    apiVersion: karpenter.sh/v1beta1
    kind: NodePool
    metadata:
      name: default
    spec:
      template:
        spec:
          nodeClassRef:
            name: default
          requirements:
            - key: karpenter.sh/capacity-type
              operator: In
              values: ["spot", "on-demand"]
            - key: karpenter.k8s.aws/instance-family
              operator: In
              values: ["c6i", "m6i", "r6i"]
      limits:
        cpu: 1000
        memory: 1000Gi
      disruption:
        consolidationPolicy: WhenUnderutilized
        consolidateAfter: 30s
  YAML
  depends_on = [helm_release.karpenter]
}

从Cluster Autoscaler迁移

  1. 计划迁移

    • 识别当前节点组及其特性
    • 映射工作负载到新NodePool配置
    • 计划共存期
  2. 与Cluster Autoscaler一起部署Karpenter

    • 在集群中安装Karpenter
    • 使用不同标签创建NodePools
    • 首先测试非关键工作负载
  3. 增量迁移工作负载

    • 使用Karpenter容忍/节点选择器更新pod规范
    • 监控供应和合并行为
    • 验证成本和性能指标
  4. 移除Cluster Autoscaler

    • 所有工作负载迁移后,缩减CA节点组
    • 移除Cluster Autoscaler部署
    • 清理CA特定资源