名称: kubernetes-operations 描述: Kubernetes操作,包括清单、Helm图表、操作符、故障排除和资源管理
Kubernetes操作
部署清单
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
labels:
app: api-server
version: v1
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: api-server
template:
metadata:
labels:
app: api-server
version: v1
spec:
containers:
- name: api
image: registry.example.com/api:1.2.0
ports:
- containerPort: 8080
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 15
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-credentials
key: url
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: api-server
始终设置资源请求和限制。使用拓扑分布约束以实现高可用性。
Helm图表结构
chart/
Chart.yaml
values.yaml
values-staging.yaml
values-production.yaml
templates/
deployment.yaml
service.yaml
ingress.yaml
hpa.yaml
_helpers.tpl
# values.yaml
replicaCount: 2
image:
repository: registry.example.com/api
tag: "1.2.0"
pullPolicy: IfNotPresent
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilization: 70
HorizontalPodAutoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
故障排除命令
# Pod诊断
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -c <container> --previous
kubectl exec -it <pod-name> -- /bin/sh
# 资源使用情况
kubectl top pods -n <namespace> --sort-by=memory
kubectl top nodes
# 网络调试
kubectl run debug --image=nicolaka/netshoot --rm -it -- bash
nslookup <service-name>.<namespace>.svc.cluster.local
# 按时间排序的事件
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
# 查找未运行的Pod
kubectl get pods -A --field-selector=status.phase!=Running
反模式
- 以root身份运行容器而不设置
securityContext.runAsNonRoot: true - 缺少资源请求/限制(导致调度问题和噪音邻居)
- 使用
latest标签而不是固定镜像版本 - 未为关键工作负载设置
PodDisruptionBudget - 将机密存储在ConfigMaps中而不是Secrets(或外部秘密管理器)
- 忽略复制部署的Pod反亲和性
检查清单
- [ ] 所有容器都有资源请求和限制
- [ ] 配置了活跃性和就绪性探针
- [ ] 镜像使用特定版本标签,而不是
latest - [ ] 机密存储在Kubernetes Secrets或外部保险库中
- [ ] 为生产工作负载设置PodDisruptionBudget
- [ ] NetworkPolicies限制命名空间之间的流量
- [ ] 拓扑分布约束或反亲和性以实现HA
- [ ] Helm值按环境拆分(暂存、生产)