name: 模型部署 description: 使用FastAPI、Docker、Kubernetes部署ML模型。用于服务预测、容器化、监控、漂移检测，或遇到延迟问题、健康检查失败、版本冲突。 keywords: 模型部署，FastAPI，Docker，Kubernetes，ML服务，模型监控，漂移检测，A/B测试，CI/CD，mlops，生产ml，模型版本控制，健康检查，Prometheus，容器化，滚动更新，蓝绿部署，金丝雀部署，模型注册表 license: MIT

ML模型部署

将训练好的模型部署到生产环境，提供适当的服务和监控。

部署选项

方法	使用场景	延迟
REST API	Web服务	中等
批处理	大规模处理	N/A
流处理	实时处理	低
边缘计算	设备端	非常低

FastAPI模型服务器

from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import numpy as np

app = FastAPI()
model = joblib.load('model.pkl')

class PredictionRequest(BaseModel):
    features: list[float]

class PredictionResponse(BaseModel):
    prediction: float
    probability: float

@app.get('/health')
def health():
    return {'status': 'healthy'}

@app.post('/predict', response_model=PredictionResponse)
def predict(request: PredictionRequest):
    features = np.array(request.features).reshape(1, -1)
    prediction = model.predict(features)[0]
    probability = model.predict_proba(features)[0].max()
    return PredictionResponse(prediction=prediction, probability=probability)

Docker部署

FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY model.pkl .
COPY app.py .

EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

模型监控

class ModelMonitor:
    def __init__(self):
        self.predictions = []
        self.latencies = []

    def log_prediction(self, input_data, prediction, latency):
        self.predictions.append({
            'input': input_data,
            'prediction': prediction,
            'latency': latency,
            'timestamp': datetime.now()
        })

    def detect_drift(self, reference_distribution):
        # 比较当前预测与参考分布
        pass

部署清单

[ ] 模型在测试集上验证
[ ] API端点文档化
[ ] 健康检查端点
[ ] 认证配置
[ ] 日志和监控设置
[ ] 模型版本控制就绪
[ ] 回滚程序文档化

快速入门：6步部署模型

# 1. 保存训练好的模型
import joblib
joblib.dump(model, 'model.pkl')

# 2. 创建FastAPI应用（见references/fastapi-production-server.md）
# app.py包含/predict和/health端点

# 3. 创建Dockerfile
cat > Dockerfile << 'EOF'
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app.py model.pkl ./
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
EOF

# 4. 本地构建和测试
docker build -t model-api:v1.0.0 .
docker run -p 8000:8000 model-api:v1.0.0

# 5. 推送到注册表
docker tag model-api:v1.0.0 registry.example.com/model-api:v1.0.0
docker push registry.example.com/model-api:v1.0.0

# 6. 部署到Kubernetes
kubectl apply -f deployment.yaml
kubectl rollout status deployment/model-api

已知问题预防

1. 无健康检查 = 停机

问题：负载均衡器将流量发送到不健康的pod，导致503错误。

解决方案：实现活跃性和就绪性探测：

# app.py
@app.get("/health")  # 活跃性：服务是否存活？
async def health():
    return {"status": "healthy"}

@app.get("/ready")  # 就绪性：能否处理流量？
async def ready():
    try:
        _ = model_store.model  # 验证模型已加载
        return {"status": "ready"}
    except:
        raise HTTPException(503, "未就绪")

# deployment.yaml
livenessProbe:
  httpGet:
    path: /health
    port: 8000
  initialDelaySeconds: 30
readinessProbe:
  httpGet:
    path: /ready
    port: 8000
  initialDelaySeconds: 5

2. 容器中模型未找到错误

问题：容器启动时FileNotFoundError: model.pkl。

解决方案：验证Dockerfile中模型文件已复制且路径匹配：

# ❌ 错误：模型在错误目录
COPY model.pkl /app/models/  # 但代码期望 /app/model.pkl

# ✅ 正确：一致路径
COPY model.pkl /models/model.pkl
ENV MODEL_PATH=/models/model.pkl

# 在Python中：
model_path = os.getenv("MODEL_PATH", "/models/model.pkl")

3. 未处理输入验证 = 500错误

问题：无效输入导致API崩溃，未处理异常。

解决方案：使用Pydantic自动验证：

from pydantic import BaseModel, Field, validator

class PredictionRequest(BaseModel):
    features: List[float] = Field(..., min_items=1, max_items=100)

    @validator('features')
    def validate_finite(cls, v):
        if not all(np.isfinite(val) for val in v):
            raise ValueError("所有特征必须有限")
        return v

# FastAPI自动验证，对无效请求返回422
@app.post("/predict")
async def predict(request: PredictionRequest):
    # 请求在此处保证有效
    pass

4. 无漂移监控 = 无声退化

问题：模型性能随时间下降，无人察觉直到用户投诉。

解决方案：实现漂移检测（见references/model-monitoring-drift.md）：

monitor = ModelMonitor(reference_data=training_data, drift_threshold=0.1)

@app.post("/predict")
async def predict(request: PredictionRequest):
    prediction = model.predict(features)
    monitor.log_prediction(features, prediction, latency)

    # 如果检测到漂移，发出警报
    if monitor.should_retrain():
        alert_manager.send_alert("检测到模型漂移 - 建议重新训练")

    return prediction

5. 缺失资源限制 = OOM杀死

问题：Pod被Kubernetes OOMKiller杀死，服务中断。

解决方案：设置内存/CPU限制和请求：

resources:
  requests:
    memory: "512Mi"  # 保证
    cpu: "500m"
  limits:
    memory: "1Gi"    # 最大允许
    cpu: "1000m"

# 监控实际使用：
kubectl top pods

6. 无回滚计划 = 卡在坏部署

问题：新模型版本有bug，无法快速回退。

解决方案：用版本标签标记镜像，保留先前部署：

# 用版本标签部署
kubectl set image deployment/model-api model-api=registry/model-api:v1.2.0

# 如果问题，回滚到先前
kubectl rollout undo deployment/model-api

# 或指定版本
kubectl set image deployment/model-api model-api=registry/model-api:v1.1.0

7. 同步预测 = 慢批处理

问题：逐个处理10,000个预测需要数小时。

解决方案：实现批处理端点：

@app.post("/predict/batch")
async def predict_batch(request: BatchPredictionRequest):
    # 一次性处理（向量化）
    features = np.array(request.instances)
    predictions = model.predict(features)  # 更快！
    return {"predictions": predictions.tolist()}

8. 无CI/CD验证 = 部署坏模型

问题：部署未通过基本测试的模型，破坏生产。

解决方案：在CI管道中验证（见references/cicd-ml-models.md）：

# .github/workflows/deploy.yml
- name: 验证模型性能
  run: |
    python scripts/validate_model.py \
      --model model.pkl \
      --test-data test.csv \
      --min-accuracy 0.85  # 低于阈值则失败

最佳实践

版本一切：模型（语义版本控制）、Docker镜像、部署
持续监控：延迟、错误率、漂移、资源使用
部署前测试：单元测试、集成测试、性能基准
逐步部署：金丝雀（10%），然后全量推出
计划回滚：保留先前版本，文档化程序
记录预测：启用调试和漂移检测
设置资源限制：防止OOM杀死和资源竞争
使用健康检查：启用适当的负载均衡

何时加载参考文件

加载参考文件以获取详细实现：

FastAPI生产服务器：加载references/fastapi-production-server.md以获取完整的生产就绪FastAPI实现，包括错误处理、验证（Pydantic模型）、日志、健康/就绪探测、批处理预测、模型版本控制、中间件、异常处理和性能优化（缓存、异步）
模型监控与漂移：加载references/model-monitoring-drift.md以获取ModelMonitor实现，包括KS测试漂移检测、Jensen-Shannon散度、Prometheus指标集成、警报配置（Slack、电子邮件）、持续监控服务和仪表板端点
容器化与部署：加载references/containerization-deployment.md以获取多阶段Dockerfile、容器中模型版本控制、Docker Compose设置、Nginx的A/B测试、Kubernetes部署（滚动更新、蓝绿、金丝雀）、GitHub Actions CI/CD和部署清单
CI/CD for ML模型：加载references/cicd-ml-models.md以获取完整的GitHub Actions管道，包括模型验证、数据验证、自动化测试、安全扫描、性能基准、自动回滚和部署策略