Name: Kubernetes运维Skill
Rating: 5 (1 reviews)
Author: rohitg00

名称: kubernetes-operations 描述: Kubernetes操作，包括清单、Helm图表、操作符、故障排除和资源管理

Kubernetes操作

部署清单

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
  labels:
    app: api-server
    version: v1
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: api-server
  template:
    metadata:
      labels:
        app: api-server
        version: v1
    spec:
      containers:
        - name: api
          image: registry.example.com/api:1.2.0
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 15
          readinessProbe:
            httpGet:
              path: /ready
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: db-credentials
                  key: url
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: api-server

始终设置资源请求和限制。使用拓扑分布约束以实现高可用性。

Helm图表结构

chart/
  Chart.yaml
  values.yaml
  values-staging.yaml
  values-production.yaml
  templates/
    deployment.yaml
    service.yaml
    ingress.yaml
    hpa.yaml
    _helpers.tpl

# values.yaml
replicaCount: 2
image:
  repository: registry.example.com/api
  tag: "1.2.0"
  pullPolicy: IfNotPresent
resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi
autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilization: 70

HorizontalPodAutoscaler

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300

故障排除命令

# Pod诊断
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -c <container> --previous
kubectl exec -it <pod-name> -- /bin/sh

# 资源使用情况
kubectl top pods -n <namespace> --sort-by=memory
kubectl top nodes

# 网络调试
kubectl run debug --image=nicolaka/netshoot --rm -it -- bash
nslookup <service-name>.<namespace>.svc.cluster.local

# 按时间排序的事件
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

# 查找未运行的Pod
kubectl get pods -A --field-selector=status.phase!=Running

反模式

以root身份运行容器而不设置 securityContext.runAsNonRoot: true
缺少资源请求/限制（导致调度问题和噪音邻居）
使用 latest 标签而不是固定镜像版本
未为关键工作负载设置 PodDisruptionBudget
将机密存储在ConfigMaps中而不是Secrets（或外部秘密管理器）
忽略复制部署的Pod反亲和性

检查清单

[ ] 所有容器都有资源请求和限制
[ ] 配置了活跃性和就绪性探针
[ ] 镜像使用特定版本标签，而不是 latest
[ ] 机密存储在Kubernetes Secrets或外部保险库中
[ ] 为生产工作负载设置PodDisruptionBudget
[ ] NetworkPolicies限制命名空间之间的流量
[ ] 拓扑分布约束或反亲和性以实现HA
[ ] Helm值按环境拆分（暂存、生产）