部署文档
概述
创建全面的部署文档,涵盖基础设施设置、CI/CD流水线、部署程序和回滚策略。
使用场景
- 部署指南
- 基础设施文档
- CI/CD流水线设置
- 配置管理
- 容器编排
- 云基础设施文档
- 发布程序
- 回滚程序
部署指南模板
# 部署指南
## 概述
本文档描述了[应用名称]的部署过程。
**部署方法:**
- 手动部署(仅限紧急情况)
- 自动化CI/CD(首选)
- 蓝绿部署
- 金丝雀部署
**环境:**
- 开发:https://dev.example.com
- 测试:https://staging.example.com
- 生产:https://example.com
---
## 前提条件
### 所需工具
```bash
# 安装所需工具
brew install node@18
brew install postgresql@14
brew install redis
brew install docker
brew install kubectl
brew install helm
brew install aws-cli
访问要求
- [ ] GitHub仓库访问权限
- [ ] AWS控制台访问权限(具有部署策略的IAM用户)
- [ ] Kubernetes集群访问权限(kubeconfig)
- [ ] Docker Hub凭证
- [ ] Datadog API密钥(监控)
- [ ] PagerDuty访问权限(值班)
环境变量
# .env.production
NODE_ENV=production
DATABASE_URL=postgresql://user:pass@db.example.com:5432/prod
REDIS_URL=redis://cache.example.com:6379
API_KEY=your-api-key
JWT_SECRET=your-jwt-secret
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=AKIA...
AWS_SECRET_ACCESS_KEY=...
CI/CD流水线
GitHub Actions工作流
# .github/workflows/deploy.yml
name: 部署到生产环境
on:
push:
branches: [main]
workflow_dispatch:
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: '18'
- run: npm ci
- run: npm test
- run: npm run lint
build:
needs: test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: 配置AWS凭证
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- name: 登录到Amazon ECR
uses: aws-actions/amazon-ecr-login@v1
- name: 构建并推送Docker镜像
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
IMAGE_TAG: ${{ github.sha }}
run: |
docker build -t $ECR_REGISTRY/app:$IMAGE_TAG .
docker push $ECR_REGISTRY/app:$IMAGE_TAG
docker tag $ECR_REGISTRY/app:$IMAGE_TAG $ECR_REGISTRY/app:latest
docker push $ECR_REGISTRY/app:latest
deploy:
needs: build
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: 配置kubectl
uses: azure/k8s-set-context@v3
with:
method: kubeconfig
kubeconfig: ${{ secrets.KUBECONFIG }}
- name: 部署到Kubernetes
env:
IMAGE_TAG: ${{ github.sha }}
run: |
kubectl set image deployment/app \
app=your-registry/app:$IMAGE_TAG \
-n production
kubectl rollout status deployment/app -n production
- name: 通知Datadog
run: |
curl -X POST "https://api.datadoghq.com/api/v1/events" \
-H "DD-API-KEY: ${{ secrets.DATADOG_API_KEY }}" \
-d '{
"title": "部署到生产环境",
"text": "部署版本 ${{ github.sha }}",
"tags": ["environment:production", "service:app"]
}'
- name: 通知Slack
if: always()
uses: slackapi/slack-github-action@v1
with:
payload: |
{
"text": "部署 ${{ job.status }}: ${{ github.sha }}"
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}
Docker配置
Dockerfile
# 多阶段构建优化
FROM node:18-alpine AS builder
WORKDIR /app
# 复制包文件
COPY package*.json ./
# 安装依赖
RUN npm ci --only=production
# 复制源代码
COPY . .
# 构建应用
RUN npm run build
# 生产阶段
FROM node:18-alpine
# 安全:以非root用户运行
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001
WORKDIR /app
# 从构建器复制构建的应用
COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist
COPY --from=builder --chown=nodejs:nodejs /app/package*.json ./
# 切换到非root用户
USER nodejs
# 暴露端口
EXPOSE 3000
# 健康检查
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD node healthcheck.js
# 启动应用
CMD ["node", "dist/server.js"]
docker-compose.yml
version: '3.8'
services:
app:
build: .
ports:
- "3000:3000"
environment:
- NODE_ENV=production
- DATABASE_URL=postgresql://postgres:password@db:5432/app
- REDIS_URL=redis://redis:6379
depends_on:
- db
- redis
restart: unless-stopped
healthcheck:
test: ["CMD", "node", "healthcheck.js"]
interval: 30s
timeout: 3s
retries: 3
db:
image: postgres:14-alpine
environment:
- POSTGRES_DB=app
- POSTGRES_PASSWORD=password
volumes:
- postgres_data:/var/lib/postgresql/data
restart: unless-stopped
redis:
image: redis:7-alpine
volumes:
- redis_data:/data
restart: unless-stopped
volumes:
postgres_data:
redis_data:
Kubernetes部署
部署清单
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
namespace: production
labels:
app: app
version: v1
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: app
template:
metadata:
labels:
app: app
version: v1
spec:
containers:
- name: app
image: your-registry/app:latest
imagePullPolicy: Always
ports:
- containerPort: 3000
name: http
env:
- name: NODE_ENV
value: "production"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: app-secrets
key: database-url
- name: REDIS_URL
valueFrom:
secretKeyRef:
name: app-secrets
key: redis-url
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 2
---
apiVersion: v1
kind: Service
metadata:
name: app
namespace: production
spec:
selector:
app: app
ports:
- port: 80
targetPort: 3000
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: app
namespace: production
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- example.com
secretName: app-tls
rules:
- host: example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: app
port:
number: 80
部署程序
1. 部署前检查清单
- [ ] 所有测试通过
- [ ] 代码审查批准
- [ ] 安全扫描通过
- [ ] 数据库迁移准备就绪
- [ ] 回滚计划记录
- [ ] 监控仪表板准备就绪
- [ ] 团队通知
- [ ] 维护窗口预定(如需要)
2. 部署步骤
# 1. 标记发布
git tag -a v1.2.3 -m "Release v1.2.3"
git push origin v1.2.3
# 2. 构建Docker镜像
docker build -t your-registry/app:v1.2.3 .
docker tag your-registry/app:v1.2.3 your-registry/app:latest
docker push your-registry/app:v1.2.3
docker push your-registry/app:latest
# 3. 运行数据库迁移
kubectl exec -it deployment/app -n production -- npm run db:migrate
# 4. 部署到Kubernetes
kubectl apply -f k8s/
kubectl set image deployment/app app=your-registry/app:v1.2.3 -n production
# 5. 等待部署
kubectl rollout status deployment/app -n production
# 6. 验证部署
kubectl get pods -n production
kubectl logs -f deployment/app -n production
# 7. 烟雾测试
curl https://example.com/health
curl https://example.com/api/v1/status
3. 部署后验证
# 检查Pod状态
kubectl get pods -n production -l app=app
# 检查日志错误
kubectl logs -f deployment/app -n production --tail=100
# 检查指标
curl https://example.com/metrics
# 运行烟雾测试
npm run test:smoke:production
# 监控验证
# - 检查Datadog仪表板
# - 检查错误率
# - 检查响应时间
# - 检查资源使用情况
回滚程序
自动回滚
# 回滚到上一个版本
kubectl rollout undo deployment/app -n production
# 回滚到特定版本
kubectl rollout undo deployment/app -n production --to-revision=2
# 检查回滚状态
kubectl rollout status deployment/app -n production
手动回滚
# 1. 确定最后的工作版本
kubectl rollout history deployment/app -n production
# 2. 部署上一个版本
kubectl set image deployment/app \
app=your-registry/app:v1.2.2 \
-n production
# 3. 回滚数据库迁移(如需要)
kubectl exec -it deployment/app -n production -- \
npm run db:migrate:undo
# 4. 验证回滚
kubectl get pods -n production
curl https://example.com/health
蓝绿部署
# 1. 部署绿色环境
kubectl apply -f k8s/deployment-green.yaml
# 2. 等待绿色环境准备就绪
kubectl rollout status deployment/app-green -n production
# 3. 测试绿色环境
curl https://green.example.com/health
# 4. 切换流量到绿色
kubectl patch service app -n production \
-p '{"spec":{"selector":{"version":"green"}}}'
# 5. 监控问题
# 如果问题:切换回蓝色
kubectl patch service app -n production \
-p '{"spec":{"selector":{"version":"blue"}}}'
# 6. 如果成功:移除蓝色部署
kubectl delete deployment app-blue -n production
监控与告警
健康检查端点
// healthcheck.js
const http = require('http');
const options = {
host: 'localhost',
port: 3000,
path: '/health',
timeout: 2000
};
const healthCheck = http.request(options, (res) => {
if (res.statusCode === 200) {
process.exit(0);
} else {
process.exit(1);
}
});
healthCheck.on('error', () => {
process.exit(1);
});
healthCheck.end();
监控清单
- [ ] CPU使用率<70%
- [ ] 内存使用率<80%
- [ ] 错误率<1%
- [ ] 响应时间p95<500ms
- [ ] 数据库连接健康
- [ ] Redis连接健康
- [ ] 所有Pod运行中
- [ ] 没有待处理的部署
基础设施即代码
Terraform配置
# main.tf
provider "aws" {
region = "us-east-1"
}
resource "aws_ecs_cluster" "main" {
name = "app-cluster"
}
resource "aws_ecs_service" "app" {
name = "app-service"
cluster = aws_ecs_cluster.main.id
task_definition = aws_ecs_task_definition.app.arn
desired_count = 3
load_balancer {
target_group_arn = aws_lb_target_group.app.arn
container_name = "app"
container_port = 3000
}
}
resource "aws_ecs_task_definition" "app" {
family = "app"
network_mode = "awsvpc"
requires_compatibilities = ["FARGATE"]
cpu = "512"
memory = "1024"
container_definitions = jsonencode([
{
name = "app"
image = "your-registry/app:latest"
essential = true
portMappings = [
{
containerPort = 3000
protocol = "tcp"
}
]
environment = [
{
name = "NODE_ENV"
value = "production"
}
]
}
])
}
## 最佳实践
### ✅ 执行
- 使用基础设施即代码
- 实施CI/CD流水线
- 使用容器编排
- 实施健康检查
- 使用滚动部署
- 有回滚程序
- 监控部署
- 记录紧急程序
- 使用秘密管理
- 实施蓝绿或金丝雀部署
### ❌ 不要
- 直接部署到生产环境
- 部署前跳过测试
- 迁移前忘记备份
- 没有回滚计划就部署
- 部署后跳过监控
- 硬编码凭证
- 在高峰时段部署(除非必要)
## 资源
- [Kubernetes文档](https://kubernetes.io/docs/)
- [Docker文档](https://docs.docker.com/)
- [GitHub Actions](https://docs.github.com/en/actions)
- [AWS ECS](https://docs.aws.amazon.com/ecs/)
- [Terraform](https://www.terraform.io/docs)