DevOps自动化与GitOps实践 devops-automation

该技能专注于利用GitHub Actions、Docker、Kubernetes、Helm和ArgoCD等现代工具,设计和实施自动化CI/CD管道,实现持续集成、持续部署和GitOps模式,以提升软件开发、测试和部署的效率、可靠性和安全性。关键词:DevOps、CI/CD、自动化部署、容器技术、云原生、GitOps、监控告警、最佳实践。

CI/CD 0 次安装 0 次浏览 更新于 3/8/2026

name: devops-automation description: 使用GitHub Actions、Docker、Kubernetes、Helm和GitOps模式设计CI/CD管道

DevOps自动化

GitHub Actions工作流结构

name: CI/CD
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
          cache: 'npm'
      - run: npm ci
      - run: npm run lint

  test:
    runs-on: ubuntu-latest
    needs: lint
    strategy:
      matrix:
        node-version: [20, 22]
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ matrix.node-version }}
          cache: 'npm'
      - run: npm ci
      - run: npm test -- --coverage
      - uses: actions/upload-artifact@v4
        with:
          name: coverage-${{ matrix.node-version }}
          path: coverage/

  deploy:
    runs-on: ubuntu-latest
    needs: test
    if: github.ref == 'refs/heads/main'
    environment: production
    steps:
      - uses: actions/checkout@v4
      - run: ./deploy.sh

关键模式:

  • 使用concurrency取消过时的运行
  • 使用设置操作的cache选项缓存依赖项
  • 使用needs处理作业依赖关系
  • 使用environment保护规则控制部署
  • 使用矩阵进行跨版本测试

Docker多阶段构建

FROM node:22-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --production

FROM node:22-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM node:22-alpine AS runner
WORKDIR /app
RUN addgroup -g 1001 appgroup && adduser -u 1001 -G appgroup -S appuser
COPY --from=deps /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package.json ./
USER appuser
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s CMD wget -qO- http://localhost:3000/health || exit 1
CMD ["node", "dist/server.js"]

规则:

  • 使用特定镜像标签,绝不使用latest
  • 以非root用户运行
  • 仅将必要文件复制到最终阶段
  • 添加HEALTHCHECK以便编排器集成
  • 使用.dockerignore排除node_modules.git、测试文件

Kubernetes部署清单

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
  labels:
    app: api-server
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api-server
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: api-server
    spec:
      containers:
        - name: api
          image: registry.example.com/api:v1.2.3
          ports:
            - containerPort: 3000
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 512Mi
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 15
            periodSeconds: 20
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: api-secrets
                  key: database-url

始终设置资源请求和限制。始终定义就绪性和存活探针。使用maxUnavailable: 0实现零停机部署。

Helm图表结构

chart/
  Chart.yaml
  values.yaml
  values-staging.yaml
  values-production.yaml
  templates/
    deployment.yaml
    service.yaml
    ingress.yaml
    hpa.yaml
    _helpers.tpl
# values.yaml
replicaCount: 2
image:
  repository: registry.example.com/api
  tag: latest
  pullPolicy: IfNotPresent
resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi
ingress:
  enabled: true
  host: api.example.com
autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilization: 70

使用values-{env}.yaml按环境覆盖。使用helm lint检查图表。部署前使用helm template测试。

ArgoCD GitOps模式

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: api-server
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/org/k8s-manifests
    targetRevision: main
    path: apps/api-server
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

GitOps原则:

  • Git是集群状态的单一来源
  • 所有更改通过PR进行(生产环境中不使用kubectl apply
  • ArgoCD自动从Git同步到集群
  • 启用selfHeal以恢复手动集群更改
  • 分离应用代码仓库和部署清单仓库

监控堆栈

# Prometheus ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: api-server
spec:
  selector:
    matchLabels:
      app: api-server
  endpoints:
    - port: metrics
      interval: 15s
      path: /metrics

关键指标暴露:

  • http_request_duration_seconds(直方图)- 按路由和状态的请求延迟
  • http_requests_total(计数器)- 按路由和状态的请求计数
  • process_resident_memory_bytes(测量)- 内存使用情况
  • db_query_duration_seconds(直方图)- 数据库查询延迟

告警条件:错误率>1%、P99延迟>2秒、内存>限制的80%、10分钟内Pod重启>3次。

管道最佳实践

  1. 保持CI在10分钟内(并行作业、积极缓存)
  2. 测试前运行代码检查和类型检查
  3. 为PR预览使用临时环境
  4. 固定所有操作版本为SHA,而非标签
  5. 在GitHub Secrets中存储密钥,绝不在工作流文件中
  6. 使用OIDC进行云提供商身份验证(无长期密钥)
  7. 使用git SHA标记镜像,而非latest
  8. 在CI中对容器镜像运行安全扫描(Trivy、Snyk)