DeploymentDocumentation deployment-documentation

这份文档提供了全面的部署指南,包括基础设施搭建、CI/CD流水线、部署策略和回滚计划,适用于软件开发中的部署实践。

DevOps 0 次安装 0 次浏览 更新于 3/3/2026

部署文档

概述

创建全面的部署文档,涵盖基础设施设置、CI/CD流水线、部署程序和回滚策略。

使用场景

  • 部署指南
  • 基础设施文档
  • CI/CD流水线设置
  • 配置管理
  • 容器编排
  • 云基础设施文档
  • 发布程序
  • 回滚程序

部署指南模板

# 部署指南

## 概述

本文档描述了[应用名称]的部署过程。

**部署方法:**
- 手动部署(仅限紧急情况)
- 自动化CI/CD(首选)
- 蓝绿部署
- 金丝雀部署

**环境:**
- 开发:https://dev.example.com
- 测试:https://staging.example.com
- 生产:https://example.com

---

## 前提条件

### 所需工具

```bash
# 安装所需工具
brew install node@18
brew install postgresql@14
brew install redis
brew install docker
brew install kubectl
brew install helm
brew install aws-cli

访问要求

  • [ ] GitHub仓库访问权限
  • [ ] AWS控制台访问权限(具有部署策略的IAM用户)
  • [ ] Kubernetes集群访问权限(kubeconfig)
  • [ ] Docker Hub凭证
  • [ ] Datadog API密钥(监控)
  • [ ] PagerDuty访问权限(值班)

环境变量

# .env.production
NODE_ENV=production
DATABASE_URL=postgresql://user:pass@db.example.com:5432/prod
REDIS_URL=redis://cache.example.com:6379
API_KEY=your-api-key
JWT_SECRET=your-jwt-secret
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...

CI/CD流水线

GitHub Actions工作流

# .github/workflows/deploy.yml
name: 部署到生产环境

on:
  push:
    branches: [main]
  workflow_dispatch:

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
        with:
          node-version: '18'
      - run: npm ci
      - run: npm test
      - run: npm run lint

  build:
    needs: test
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: 配置AWS凭证
        uses: aws-actions/configure-aws-credentials@v2
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1

      - name: 登录到Amazon ECR
        uses: aws-actions/amazon-ecr-login@v1

      - name: 构建并推送Docker镜像
        env:
          ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
          IMAGE_TAG: ${{ github.sha }}
        run: |
          docker build -t $ECR_REGISTRY/app:$IMAGE_TAG .
          docker push $ECR_REGISTRY/app:$IMAGE_TAG
          docker tag $ECR_REGISTRY/app:$IMAGE_TAG $ECR_REGISTRY/app:latest
          docker push $ECR_REGISTRY/app:latest

  deploy:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: 配置kubectl
        uses: azure/k8s-set-context@v3
        with:
          method: kubeconfig
          kubeconfig: ${{ secrets.KUBECONFIG }}

      - name: 部署到Kubernetes
        env:
          IMAGE_TAG: ${{ github.sha }}
        run: |
          kubectl set image deployment/app \
            app=your-registry/app:$IMAGE_TAG \
            -n production

          kubectl rollout status deployment/app -n production

      - name: 通知Datadog
        run: |
          curl -X POST "https://api.datadoghq.com/api/v1/events" \
            -H "DD-API-KEY: ${{ secrets.DATADOG_API_KEY }}" \
            -d '{
              "title": "部署到生产环境",
              "text": "部署版本 ${{ github.sha }}",
              "tags": ["environment:production", "service:app"]
            }'

      - name: 通知Slack
        if: always()
        uses: slackapi/slack-github-action@v1
        with:
          payload: |
            {
              "text": "部署 ${{ job.status }}: ${{ github.sha }}"
            }
        env:
          SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}

Docker配置

Dockerfile

# 多阶段构建优化
FROM node:18-alpine AS builder

WORKDIR /app

# 复制包文件
COPY package*.json ./

# 安装依赖
RUN npm ci --only=production

# 复制源代码
COPY . .

# 构建应用
RUN npm run build

# 生产阶段
FROM node:18-alpine

# 安全:以非root用户运行
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nodejs -u 1001

WORKDIR /app

# 从构建器复制构建的应用
COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist
COPY --from=builder --chown=nodejs:nodejs /app/package*.json ./

# 切换到非root用户
USER nodejs

# 暴露端口
EXPOSE 3000

# 健康检查
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD node healthcheck.js

# 启动应用
CMD ["node", "dist/server.js"]

docker-compose.yml

version: '3.8'

services:
  app:
    build: .
    ports:
      - "3000:3000"
    environment:
      - NODE_ENV=production
      - DATABASE_URL=postgresql://postgres:password@db:5432/app
      - REDIS_URL=redis://redis:6379
    depends_on:
      - db
      - redis
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "node", "healthcheck.js"]
      interval: 30s
      timeout: 3s
      retries: 3

  db:
    image: postgres:14-alpine
    environment:
      - POSTGRES_DB=app
      - POSTGRES_PASSWORD=password
    volumes:
      - postgres_data:/var/lib/postgresql/data
    restart: unless-stopped

  redis:
    image: redis:7-alpine
    volumes:
      - redis_data:/data
    restart: unless-stopped

volumes:
  postgres_data:
  redis_data:

Kubernetes部署

部署清单

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app
  namespace: production
  labels:
    app: app
    version: v1
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: app
  template:
    metadata:
      labels:
        app: app
        version: v1
    spec:
      containers:
      - name: app
        image: your-registry/app:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 3000
          name: http
        env:
        - name: NODE_ENV
          value: "production"
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: app-secrets
              key: database-url
        - name: REDIS_URL
          valueFrom:
            secretKeyRef:
              name: app-secrets
              key: redis-url
        resources:
          requests:
            cpu: 500m
            memory: 512Mi
          limits:
            cpu: 1000m
            memory: 1Gi
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 3
          failureThreshold: 3
        readinessProbe:
          httpGet:
            path: /ready
            port: 3000
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 2

---
apiVersion: v1
kind: Service
metadata:
  name: app
  namespace: production
spec:
  selector:
    app: app
  ports:
  - port: 80
    targetPort: 3000
  type: ClusterIP

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app
  namespace: production
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
  - hosts:
    - example.com
    secretName: app-tls
  rules:
  - host: example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: app
            port:
              number: 80

部署程序

1. 部署前检查清单

  • [ ] 所有测试通过
  • [ ] 代码审查批准
  • [ ] 安全扫描通过
  • [ ] 数据库迁移准备就绪
  • [ ] 回滚计划记录
  • [ ] 监控仪表板准备就绪
  • [ ] 团队通知
  • [ ] 维护窗口预定(如需要)

2. 部署步骤

# 1. 标记发布
git tag -a v1.2.3 -m "Release v1.2.3"
git push origin v1.2.3

# 2. 构建Docker镜像
docker build -t your-registry/app:v1.2.3 .
docker tag your-registry/app:v1.2.3 your-registry/app:latest
docker push your-registry/app:v1.2.3
docker push your-registry/app:latest

# 3. 运行数据库迁移
kubectl exec -it deployment/app -n production -- npm run db:migrate

# 4. 部署到Kubernetes
kubectl apply -f k8s/
kubectl set image deployment/app app=your-registry/app:v1.2.3 -n production

# 5. 等待部署
kubectl rollout status deployment/app -n production

# 6. 验证部署
kubectl get pods -n production
kubectl logs -f deployment/app -n production

# 7. 烟雾测试
curl https://example.com/health
curl https://example.com/api/v1/status

3. 部署后验证

# 检查Pod状态
kubectl get pods -n production -l app=app

# 检查日志错误
kubectl logs -f deployment/app -n production --tail=100

# 检查指标
curl https://example.com/metrics

# 运行烟雾测试
npm run test:smoke:production

# 监控验证
# - 检查Datadog仪表板
# - 检查错误率
# - 检查响应时间
# - 检查资源使用情况

回滚程序

自动回滚

# 回滚到上一个版本
kubectl rollout undo deployment/app -n production

# 回滚到特定版本
kubectl rollout undo deployment/app -n production --to-revision=2

# 检查回滚状态
kubectl rollout status deployment/app -n production

手动回滚

# 1. 确定最后的工作版本
kubectl rollout history deployment/app -n production

# 2. 部署上一个版本
kubectl set image deployment/app \
  app=your-registry/app:v1.2.2 \
  -n production

# 3. 回滚数据库迁移(如需要)
kubectl exec -it deployment/app -n production -- \
  npm run db:migrate:undo

# 4. 验证回滚
kubectl get pods -n production
curl https://example.com/health

蓝绿部署

# 1. 部署绿色环境
kubectl apply -f k8s/deployment-green.yaml

# 2. 等待绿色环境准备就绪
kubectl rollout status deployment/app-green -n production

# 3. 测试绿色环境
curl https://green.example.com/health

# 4. 切换流量到绿色
kubectl patch service app -n production \
  -p '{"spec":{"selector":{"version":"green"}}}'

# 5. 监控问题
# 如果问题:切换回蓝色
kubectl patch service app -n production \
  -p '{"spec":{"selector":{"version":"blue"}}}'

# 6. 如果成功:移除蓝色部署
kubectl delete deployment app-blue -n production

监控与告警

健康检查端点

// healthcheck.js
const http = require('http');

const options = {
  host: 'localhost',
  port: 3000,
  path: '/health',
  timeout: 2000
};

const healthCheck = http.request(options, (res) => {
  if (res.statusCode === 200) {
    process.exit(0);
  } else {
    process.exit(1);
  }
});

healthCheck.on('error', () => {
  process.exit(1);
});

healthCheck.end();

监控清单

  • [ ] CPU使用率<70%
  • [ ] 内存使用率<80%
  • [ ] 错误率<1%
  • [ ] 响应时间p95<500ms
  • [ ] 数据库连接健康
  • [ ] Redis连接健康
  • [ ] 所有Pod运行中
  • [ ] 没有待处理的部署

基础设施即代码

Terraform配置

# main.tf
provider "aws" {
  region = "us-east-1"
}

resource "aws_ecs_cluster" "main" {
  name = "app-cluster"
}

resource "aws_ecs_service" "app" {
  name            = "app-service"
  cluster         = aws_ecs_cluster.main.id
  task_definition = aws_ecs_task_definition.app.arn
  desired_count   = 3

  load_balancer {
    target_group_arn = aws_lb_target_group.app.arn
    container_name   = "app"
    container_port   = 3000
  }
}

resource "aws_ecs_task_definition" "app" {
  family                   = "app"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = "512"
  memory                   = "1024"

  container_definitions = jsonencode([
    {
      name      = "app"
      image     = "your-registry/app:latest"
      essential = true
      portMappings = [
        {
          containerPort = 3000
          protocol      = "tcp"
        }
      ]
      environment = [
        {
          name  = "NODE_ENV"
          value = "production"
        }
      ]
    }
  ])
}

## 最佳实践

### ✅ 执行
- 使用基础设施即代码
- 实施CI/CD流水线
- 使用容器编排
- 实施健康检查
- 使用滚动部署
- 有回滚程序
- 监控部署
- 记录紧急程序
- 使用秘密管理
- 实施蓝绿或金丝雀部署

### ❌ 不要
- 直接部署到生产环境
- 部署前跳过测试
- 迁移前忘记备份
- 没有回滚计划就部署
- 部署后跳过监控
- 硬编码凭证
- 在高峰时段部署(除非必要)

## 资源

- [Kubernetes文档](https://kubernetes.io/docs/)
- [Docker文档](https://docs.docker.com/)
- [GitHub Actions](https://docs.github.com/en/actions)
- [AWS ECS](https://docs.aws.amazon.com/ecs/)
- [Terraform](https://www.terraform.io/docs)