name: devops-automation description: 使用GitHub Actions、Docker、Kubernetes、Helm和GitOps模式设计CI/CD管道
DevOps自动化
GitHub Actions工作流结构
name: CI/CD
on:
push:
branches: [main]
pull_request:
branches: [main]
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
cache: 'npm'
- run: npm ci
- run: npm run lint
test:
runs-on: ubuntu-latest
needs: lint
strategy:
matrix:
node-version: [20, 22]
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node-version }}
cache: 'npm'
- run: npm ci
- run: npm test -- --coverage
- uses: actions/upload-artifact@v4
with:
name: coverage-${{ matrix.node-version }}
path: coverage/
deploy:
runs-on: ubuntu-latest
needs: test
if: github.ref == 'refs/heads/main'
environment: production
steps:
- uses: actions/checkout@v4
- run: ./deploy.sh
关键模式:
- 使用
concurrency取消过时的运行 - 使用设置操作的
cache选项缓存依赖项 - 使用
needs处理作业依赖关系 - 使用
environment保护规则控制部署 - 使用矩阵进行跨版本测试
Docker多阶段构建
FROM node:22-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --production
FROM node:22-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
COPY . .
RUN npm run build
FROM node:22-alpine AS runner
WORKDIR /app
RUN addgroup -g 1001 appgroup && adduser -u 1001 -G appgroup -S appuser
COPY --from=deps /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package.json ./
USER appuser
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s CMD wget -qO- http://localhost:3000/health || exit 1
CMD ["node", "dist/server.js"]
规则:
- 使用特定镜像标签,绝不使用
latest - 以非root用户运行
- 仅将必要文件复制到最终阶段
- 添加
HEALTHCHECK以便编排器集成 - 使用
.dockerignore排除node_modules、.git、测试文件
Kubernetes部署清单
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
labels:
app: api-server
spec:
replicas: 3
selector:
matchLabels:
app: api-server
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: api-server
spec:
containers:
- name: api
image: registry.example.com/api:v1.2.3
ports:
- containerPort: 3000
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 15
periodSeconds: 20
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: api-secrets
key: database-url
始终设置资源请求和限制。始终定义就绪性和存活探针。使用maxUnavailable: 0实现零停机部署。
Helm图表结构
chart/
Chart.yaml
values.yaml
values-staging.yaml
values-production.yaml
templates/
deployment.yaml
service.yaml
ingress.yaml
hpa.yaml
_helpers.tpl
# values.yaml
replicaCount: 2
image:
repository: registry.example.com/api
tag: latest
pullPolicy: IfNotPresent
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 512Mi
ingress:
enabled: true
host: api.example.com
autoscaling:
enabled: true
minReplicas: 2
maxReplicas: 10
targetCPUUtilization: 70
使用values-{env}.yaml按环境覆盖。使用helm lint检查图表。部署前使用helm template测试。
ArgoCD GitOps模式
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: api-server
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/org/k8s-manifests
targetRevision: main
path: apps/api-server
destination:
server: https://kubernetes.default.svc
namespace: production
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
GitOps原则:
- Git是集群状态的单一来源
- 所有更改通过PR进行(生产环境中不使用
kubectl apply) - ArgoCD自动从Git同步到集群
- 启用
selfHeal以恢复手动集群更改 - 分离应用代码仓库和部署清单仓库
监控堆栈
# Prometheus ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: api-server
spec:
selector:
matchLabels:
app: api-server
endpoints:
- port: metrics
interval: 15s
path: /metrics
关键指标暴露:
http_request_duration_seconds(直方图)- 按路由和状态的请求延迟http_requests_total(计数器)- 按路由和状态的请求计数process_resident_memory_bytes(测量)- 内存使用情况db_query_duration_seconds(直方图)- 数据库查询延迟
告警条件:错误率>1%、P99延迟>2秒、内存>限制的80%、10分钟内Pod重启>3次。
管道最佳实践
- 保持CI在10分钟内(并行作业、积极缓存)
- 测试前运行代码检查和类型检查
- 为PR预览使用临时环境
- 固定所有操作版本为SHA,而非标签
- 在GitHub Secrets中存储密钥,绝不在工作流文件中
- 使用OIDC进行云提供商身份验证(无长期密钥)
- 使用git SHA标记镜像,而非
latest - 在CI中对容器镜像运行安全扫描(Trivy、Snyk)