ArgoCDGitOps连续交付技能Skill argocd

Argo CD 是一个声明式的 GitOps 连续交付工具,专为 Kubernetes 设计,用于自动化应用程序部署、同步、回滚和多集群管理。它通过 Git 仓库作为源来定义期望状态,并确保集群状态与 Git 保持一致。关键词:Argo CD, GitOps, Kubernetes, 连续交付, 部署自动化, 应用程序管理, DevOps, 云原生

CI/CD 0 次安装 0 次浏览 更新于 3/24/2026

name: argocd description: 使用 Argo CD 进行 Kubernetes 部署的 GitOps 连续交付。在实现声明式 GitOps 工作流、应用程序同步/回滚、多集群部署、渐进式交付或 CD 自动化时使用。触发器:argocd, argo cd, gitops, application, sync, rollback, app of apps, applicationset, declarative, continuous delivery, CD, deployment automation, kubernetes deployment, multi-cluster, canary deployment, blue-green。 allowed-tools: Read, Grep, Glob, Edit, Write, Bash

Argo CD GitOps 连续交付

概述

Argo CD 是一个声明式的 GitOps 连续交付工具,专为 Kubernetes 设计,用于自动化应用程序部署和生命周期管理。它遵循 GitOps 模式,其中 Git 仓库是定义期望应用程序状态的真相源。

核心概念

  • 应用程序:由 Git 中清单定义的一组 Kubernetes 资源
  • 应用程序源类型:用于定义应用程序的工具/格式(Helm、Kustomize、普通 YAML、Jsonnet)
  • 目标状态:Git 中表示的应用程序期望状态
  • 实时状态:Kubernetes 中运行的应用程序的实际状态
  • 同步状态:实时状态是否与目标状态匹配
  • 同步:使实时状态匹配目标状态的过程
  • 健康:应用程序资源的健康状态
  • 刷新:比较 Git 中的最新代码与实时状态
  • 项目:具有 RBAC 策略的应用程序逻辑分组

安装和设置

在 Kubernetes 中安装 Argo CD

# 创建命名空间
kubectl create namespace argocd

# 安装 Argo CD
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# 使用高可用安装(生产环境)
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/ha/install.yaml

# 访问 UI
kubectl port-forward svc/argocd-server -n argocd 8080:443

# 获取初始管理员密码
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d

# 安装 CLI
brew install argocd  # macOS
# 或从 https://github.com/argoproj/argo-cd/releases 下载

初始配置

# 通过 CLI 登录
argocd login localhost:8080

# 更改管理员密码
argocd account update-password

# 注册外部集群
argocd cluster add my-cluster-context

# 添加 Git 仓库
argocd repo add https://github.com/myorg/myrepo.git --username myuser --password mytoken

仓库结构

推荐目录布局

gitops-repo/
├── apps/                           # 应用程序定义
│   ├── base/                       # 基础应用程序配置
│   │   ├── app1/
│   │   │   ├── kustomization.yaml
│   │   │   └── deployment.yaml
│   │   └── app2/
│   └── overlays/                   # 环境特定覆盖
│       ├── dev/
│       │   ├── kustomization.yaml
│       │   └── patches/
│       ├── staging/
│       └── production/
├── charts/                         # Helm 图表(如果使用 Helm)
│   └── myapp/
│       ├── Chart.yaml
│       ├── values.yaml
│       └── templates/
├── argocd/                         # Argo CD 配置
│   ├── projects/                   # AppProjects
│   ├── applications/               # 应用程序清单
│   │   ├── app1.yaml
│   │   └── app2.yaml
│   └── applicationsets/            # ApplicationSets
│       ├── cluster-apps.yaml
│       └── tenant-apps.yaml
└── bootstrap/                      # 应用程序的应用程序引导
    └── root-app.yaml

分离策略

单仓库

所有环境的单个仓库

  • 优点:管理更简单,更容易跟踪更改
  • 缺点:所有团队都有访问权限,更难强制分离

每个环境一个仓库

开发/预生产/生产环境分别的仓库

  • 优点:更好的安全边界,清晰的升级路径
  • 缺点:更多仓库需要管理,重复配置

每个团队一个仓库

每个团队/服务的分别仓库

  • 优点:团队自治,清晰的归属
  • 缺点:跨团队协调复杂性

应用程序清单

基本应用程序

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
  namespace: argocd
  # 终结器确保级联删除
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  # 项目名称(默认为 'default')
  project: default

  # 源配置
  source:
    repoURL: https://github.com/myorg/myrepo.git
    targetRevision: HEAD
    path: apps/production/myapp

  # 目标集群和命名空间
  destination:
    server: https://kubernetes.default.svc
    namespace: myapp-production

  # 同步策略
  syncPolicy:
    automated:
      prune: true # 删除 Git 中未存在的资源
      selfHeal: true # 当集群状态不同时自动同步
      allowEmpty: false
    syncOptions:
      - CreateNamespace=true
      - PrunePropagationPolicy=foreground
      - PruneLast=true
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

使用 Helm 的应用程序

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp-helm
  namespace: argocd
spec:
  project: default

  source:
    repoURL: https://github.com/myorg/charts.git
    targetRevision: main
    path: charts/myapp
    helm:
      # Helm 值文件
      valueFiles:
        - values.yaml
        - values-production.yaml

      # 内联值(最高优先级)
      values: |
        replicaCount: 3
        image:
          tag: v1.2.3
        resources:
          limits:
            cpu: 500m
            memory: 512Mi

      # 覆盖特定值
      parameters:
        - name: image.repository
          value: myregistry.io/myapp

      # 跳过 CRDs 安装
      skipCrds: false

      # 发布名称
      releaseName: myapp

  destination:
    server: https://kubernetes.default.svc
    namespace: myapp

  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

使用 Kustomize 的应用程序

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp-kustomize
  namespace: argocd
spec:
  project: default

  source:
    repoURL: https://github.com/myorg/myrepo.git
    targetRevision: HEAD
    path: apps/overlays/production
    kustomize:
      # Kustomize 版本
      version: v5.0.0

      # 名称前缀/后缀
      namePrefix: prod-
      nameSuffix: -v1

      # 要覆盖的镜像
      images:
        - name: myapp
          newName: myregistry.io/myapp
          newTag: v1.2.3

      # 通用标签
      commonLabels:
        environment: production
        managed-by: argocd

      # 通用注解
      commonAnnotations:
        deployed-by: argocd

      # 副本覆盖
      replicas:
        - name: myapp-deployment
          count: 3

  destination:
    server: https://kubernetes.default.svc
    namespace: myapp-production

ApplicationSets

集群生成器

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: cluster-apps
  namespace: argocd
spec:
  # 为所有注册的集群生成应用程序
  generators:
    - clusters:
        selector:
          matchLabels:
            env: production
          matchExpressions:
            - key: region
              operator: In
              values: [us-east-1, us-west-2]
        values:
          # 模板中可用的默认值
          revision: main

  template:
    metadata:
      name: "{{name}}-myapp"
      labels:
        cluster: "{{name}}"
    spec:
      project: default
      source:
        repoURL: https://github.com/myorg/myrepo.git
        targetRevision: "{{values.revision}}"
        path: apps/production/myapp
        helm:
          parameters:
            - name: cluster.name
              value: "{{name}}"
            - name: cluster.region
              value: "{{metadata.labels.region}}"
      destination:
        server: "{{server}}"
        namespace: myapp
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true

Git 目录生成器

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: git-directory-apps
  namespace: argocd
spec:
  generators:
    - git:
        repoURL: https://github.com/myorg/myrepo.git
        revision: HEAD
        directories:
          - path: apps/production/*
          - path: apps/production/exclude-this
            exclude: true

  template:
    metadata:
      name: "{{path.basename}}"
    spec:
      project: default
      source:
        repoURL: https://github.com/myorg/myrepo.git
        targetRevision: HEAD
        path: "{{path}}"
      destination:
        server: https://kubernetes.default.svc
        namespace: "{{path.basename}}"
      syncPolicy:
        automated:
          prune: true
          selfHeal: true

Git 文件生成器

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: git-file-apps
  namespace: argocd
spec:
  generators:
    - git:
        repoURL: https://github.com/myorg/myrepo.git
        revision: HEAD
        files:
          - path: apps/*/config.json

  template:
    metadata:
      name: "{{app.name}}"
    spec:
      project: default
      source:
        repoURL: https://github.com/myorg/myrepo.git
        targetRevision: HEAD
        path: "apps/{{app.name}}"
        helm:
          parameters:
            - name: replicaCount
              value: "{{app.replicas}}"
            - name: environment
              value: "{{app.environment}}"
      destination:
        server: https://kubernetes.default.svc
        namespace: "{{app.namespace}}"

矩阵生成器

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: matrix-apps
  namespace: argocd
spec:
  generators:
    # 矩阵组合多个生成器
    - matrix:
        generators:
          # 第一维度:集群
          - clusters:
              selector:
                matchLabels:
                  env: production
          # 第二维度:git 目录
          - git:
              repoURL: https://github.com/myorg/myrepo.git
              revision: HEAD
              directories:
                - path: apps/*

  template:
    metadata:
      name: "{{path.basename}}-{{name}}"
    spec:
      project: default
      source:
        repoURL: https://github.com/myorg/myrepo.git
        targetRevision: HEAD
        path: "{{path}}"
      destination:
        server: "{{server}}"
        namespace: "{{path.basename}}"
      syncPolicy:
        automated:
          prune: true
          selfHeal: true

列表生成器(多租户)

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: tenant-apps
  namespace: argocd
spec:
  generators:
    - list:
        elements:
          - tenant: team-a
            namespace: team-a-prod
            repoURL: https://github.com/team-a/apps.git
            quota:
              cpu: "10"
              memory: 20Gi
          - tenant: team-b
            namespace: team-b-prod
            repoURL: https://github.com/team-b/apps.git
            quota:
              cpu: "20"
              memory: 40Gi

  template:
    metadata:
      name: "{{tenant}}-app"
      labels:
        tenant: "{{tenant}}"
    spec:
      project: "{{tenant}}"
      source:
        repoURL: "{{repoURL}}"
        targetRevision: main
        path: production
      destination:
        server: https://kubernetes.default.svc
        namespace: "{{namespace}}"
      syncPolicy:
        automated:
          prune: true
          selfHeal: true

ApplicationSet 模式

何时使用 ApplicationSets

ApplicationSets 使用生成器自动化多个 Argo CD 应用程序的创建和管理。在以下情况下使用:

  • 使用相同配置部署到多个集群
  • 管理多个租户或团队
  • 从 Git 仓库结构发现应用程序
  • 实现环境升级策略

生成器选择指南

生成器 使用案例 示例
集群 将相同应用程序部署到多个集群 多区域部署
Git 目录 从仓库目录结构生成应用程序 具有每个目录应用程序的单仓库
Git 文件 从 Git 中的配置文件生成应用程序 每个应用程序的 JSON/YAML 配置
列表 静态参数列表 租户定义
矩阵 组合多个生成器 跨集群和环境的应用程序
拉取请求 每个 PR 的预览环境 临时测试环境
SCM 提供商 从 GitHub/GitLab 发现仓库 组织范围的应用程序发现

使用 Git 目录的多环境

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: multi-env-apps
  namespace: argocd
spec:
  generators:
    - matrix:
        generators:
          # 第一:从目录结构发现应用程序
          - git:
              repoURL: https://github.com/myorg/apps.git
              revision: HEAD
              directories:
                - path: apps/*
          # 第二:应用到多个环境
          - list:
              elements:
                - env: dev
                  cluster: https://dev-cluster.example.com
                  replicas: "1"
                - env: staging
                  cluster: https://staging-cluster.example.com
                  replicas: "2"
                - env: production
                  cluster: https://prod-cluster.example.com
                  replicas: "3"

  template:
    metadata:
      name: "{{path.basename}}-{{env}}"
      labels:
        app: "{{path.basename}}"
        env: "{{env}}"
    spec:
      project: default
      source:
        repoURL: https://github.com/myorg/apps.git
        targetRevision: HEAD
        path: "{{path}}"
        helm:
          parameters:
            - name: environment
              value: "{{env}}"
            - name: replicaCount
              value: "{{replicas}}"
      destination:
        server: "{{cluster}}"
        namespace: "{{path.basename}}-{{env}}"
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true

拉取请求预览环境

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: pr-preview
  namespace: argocd
spec:
  generators:
    - pullRequest:
        github:
          owner: myorg
          repo: myapp
          tokenRef:
            secretName: github-token
            key: token
          labels:
            - preview
        requeueAfterSeconds: 60

  template:
    metadata:
      name: "myapp-pr-{{number}}"
      labels:
        preview: "true"
        pr: "{{number}}"
    spec:
      project: default
      source:
        repoURL: https://github.com/myorg/myapp.git
        targetRevision: "{{head_sha}}"
        path: k8s/overlays/preview
        kustomize:
          commonLabels:
            pr: "{{number}}"
          images:
            - name: myapp
              newTag: "pr-{{number}}"
      destination:
        server: https://kubernetes.default.svc
        namespace: "myapp-pr-{{number}}"
      syncPolicy:
        automated:
          prune: true
          selfHeal: true
        syncOptions:
          - CreateNamespace=true

SCM 提供商发现

apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
  name: org-repos
  namespace: argocd
spec:
  generators:
    - scmProvider:
        github:
          organization: myorg
          tokenRef:
            secretName: github-token
            key: token
        filters:
          - repositoryMatch: ".*-service$"
          - pathsExist: [k8s/production]

  template:
    metadata:
      name: "{{repository}}"
    spec:
      project: default
      source:
        repoURL: "{{url}}"
        targetRevision: main
        path: k8s/production
      destination:
        server: https://kubernetes.default.svc
        namespace: "{{repository}}"
      syncPolicy:
        automated:
          prune: true
          selfHeal: true

同步策略

策略选择指南

策略 使用案例 风险 自动化
自动化 + 自愈 非生产环境 完整
自动化(无自愈) 需要手动干预的预生产 部分
手动 生产部署
同步窗口 业务小时限制 计划
渐进式(滚动) 逐步生产部署 条件

带条件的自动化同步

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: conditional-sync
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/myrepo.git
    targetRevision: HEAD
    path: apps/myapp
  destination:
    server: https://kubernetes.default.svc
    namespace: myapp

  syncPolicy:
    automated:
      prune: true
      selfHeal: true
      allowEmpty: false

    syncOptions:
      - CreateNamespace=true
      - PruneLast=true
      - PrunePropagationPolicy=foreground
      - RespectIgnoreDifferences=true
      - ApplyOutOfSyncOnly=true

    # 使用指数退避重试
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

  # 忽略对特定字段的手动更改
  ignoreDifferences:
    - group: apps
      kind: Deployment
      jsonPointers:
        - /spec/replicas
    - group: ""
      kind: Service
      jqPathExpressions:
        - .spec.ports[] | select(.nodePort != null) | .nodePort

同步窗口(基于时间的部署控制)

apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: production
  namespace: argocd
spec:
  description: 具有同步窗口的生产项目

  sourceRepos:
    - "*"

  destinations:
    - namespace: "*"
      server: https://prod-cluster.example.com

  # 定义同步窗口
  syncWindows:
    # 允许在营业时间内同步(周一至周五 UTC 9am-5pm)
    - kind: allow
      schedule: "0 9 * * 1-5"
      duration: 8h
      applications:
        - "*"
      namespaces:
        - production-*
      clusters:
        - https://prod-cluster.example.com

    # 在高峰流量期间阻止同步(每日 UTC 12pm-2pm)
    - kind: deny
      schedule: "0 12 * * *"
      duration: 2h
      applications:
        - "*"

    # 紧急同步窗口(需要手动覆盖)
    - kind: allow
      schedule: "* * * * *"
      duration: 1h
      manualSync: true
      applications:
        - critical-app

选择性同步(资源级控制)

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: selective-sync
  namespace: argocd
  annotations:
    # 仅同步特定资源类型
    argocd.argoproj.io/sync-options: Prune=false
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/myrepo.git
    targetRevision: HEAD
    path: apps/myapp
  destination:
    server: https://kubernetes.default.svc
    namespace: myapp

  # 忽略同步中的特定资源
  ignoreDifferences:
    - group: "*"
      kind: Secret
      name: external-secret
      jsonPointers:
        - /data

  syncPolicy:
    syncOptions:
      - CreateNamespace=true
      # 仅修剪特定资源类型
      - PruneResourcesOnDeletion=true

蓝绿同步策略

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp-blue-green
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/myrepo.git
    targetRevision: HEAD
    path: apps/myapp
  destination:
    server: https://kubernetes.default.svc
    namespace: myapp

  syncPolicy:
    # 生产环境的手动同步
    syncOptions:
      - CreateNamespace=true

    # 使用同步波进行蓝绿部署
    syncWaves:
      - wave: 0 # 部署新版本(绿色)
      - wave: 1 # 运行冒烟测试
      - wave: 2 # 切换流量
      - wave: 3 # 移除旧版本(蓝色)

回滚程序

自动回滚策略

应用程序级回滚

# 查看同步历史
argocd app history myapp

# 回滚到上一次同步
argocd app rollback myapp

# 回滚到特定修订版
argocd app rollback myapp 5

# 带修剪的回滚
argocd app rollback myapp 5 --prune

基于 Git 的回滚(推荐)

# 恢复 Git 提交
git revert HEAD
git push origin main

# Argo CD 自动同步恢复
# 这会在 Git 中维护完整的审计轨迹

使用 Argo Rollouts 回滚

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: myapp
  namespace: myapp
spec:
  replicas: 5
  strategy:
    blueGreen:
      activeService: myapp-active
      previewService: myapp-preview
      autoPromotionEnabled: false
      autoPromotionSeconds: 30
      scaleDownDelaySeconds: 300
      scaleDownDelayRevisionLimit: 1

      # 指标失败时自动回滚
      antiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution: {}

  revisionHistoryLimit: 5

  selector:
    matchLabels:
      app: myapp

  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
        - name: myapp
          image: myapp:stable
---
# 手动回滚命令
# kubectl argo rollouts abort myapp
# kubectl argo rollouts undo myapp
# kubectl argo rollouts retry myapp

健康检查失败时的回滚

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: auto-rollback-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/myrepo.git
    targetRevision: HEAD
    path: apps/myapp
  destination:
    server: https://kubernetes.default.svc
    namespace: myapp

  syncPolicy:
    automated:
      prune: true
      selfHeal: false # 禁用自愈以控制手动回滚

    # 失败时重试同步
    retry:
      limit: 3
      backoff:
        duration: 10s
        factor: 2
        maxDuration: 1m

  # 触发回滚的自定义健康检查
  syncOptions:
    - Validate=true
    - FailOnSharedResource=false


# 使用 PreSync 钩子备份当前状态
---
apiVersion: batch/v1
kind: Job
metadata:
  name: pre-sync-backup
  namespace: myapp
  annotations:
    argocd.argoproj.io/hook: PreSync
    argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
spec:
  template:
    spec:
      containers:
        - name: backup
          image: kubectl:latest
          command:
            - /bin/sh
            - -c
            - |
              kubectl get all -n myapp -o yaml > /backup/previous-state.yaml
      restartPolicy: Never
---
# 使用 SyncFail 钩子进行自动回滚
apiVersion: batch/v1
kind: Job
metadata:
  name: rollback-on-fail
  namespace: myapp
  annotations:
    argocd.argoproj.io/hook: SyncFail
    argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
spec:
  template:
    spec:
      serviceAccountName: argocd-rollback
      containers:
        - name: rollback
          image: argoproj/argocd:latest
          command:
            - /bin/sh
            - -c
            - |
              argocd app rollback myapp --auth-token $ARGOCD_TOKEN
      restartPolicy: Never

紧急回滚运行手册

# 1. 检查应用程序状态
argocd app get myapp
argocd app history myapp

# 2. 识别最后一个已知良好修订版
argocd app history myapp | grep Succeeded

# 3. 快速回滚到上一个修订版
argocd app rollback myapp

# 4. 如果回滚失败,强制同步并替换
argocd app sync myapp --force --replace --prune

# 5. 如果仍然失败,恢复 Git 并强制同步
cd gitops-repo
git revert HEAD --no-commit
git commit -m "紧急回滚"
git push origin main
argocd app sync myapp --force

# 6. 如果需要,手动资源清理
kubectl delete deployment myapp -n myapp
argocd app sync myapp --force

# 7. 验证健康和同步状态
argocd app wait myapp --health --timeout 300

# 8. 记录事件
echo "回滚完成于 $(date)" >> /var/log/incidents/myapp-rollback.log

回滚测试(预生产)

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: rollback-test
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/myrepo.git
    targetRevision: HEAD
    path: apps/myapp
  destination:
    server: https://kubernetes.default.svc
    namespace: myapp-test

  syncPolicy:
    automated:
      prune: true
      selfHeal: true


# PostSync 钩子测试回滚能力
---
apiVersion: batch/v1
kind: Job
metadata:
  name: test-rollback
  namespace: myapp-test
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
spec:
  template:
    spec:
      serviceAccountName: argocd-test
      containers:
        - name: test
          image: argoproj/argocd:latest
          command:
            - /bin/sh
            - -c
            - |
              # 测试应用程序健康
              argocd app wait rollback-test --health --timeout 60

              # 执行回滚测试
              argocd app rollback rollback-test

              # 验证回滚成功
              argocd app wait rollback-test --health --timeout 60

              # 重新同步到最新
              argocd app sync rollback-test
      restartPolicy: Never

应用程序的应用程序模式

根应用程序(引导)

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: root-app
  namespace: argocd
  finalizers:
    - resources-finalizer.argocd.argoproj.io
spec:
  project: default

  source:
    repoURL: https://github.com/myorg/gitops.git
    targetRevision: HEAD
    path: argocd/applications
    directory:
      recurse: true

  destination:
    server: https://kubernetes.default.svc
    namespace: argocd

  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

基础设施应用程序

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: infrastructure
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/gitops.git
    targetRevision: HEAD
    path: argocd/infrastructure
    directory:
      recurse: true
  destination:
    server: https://kubernetes.default.svc
    namespace: argocd
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

分层应用程序的应用程序

root-app
├── infrastructure (sync-wave: 0)
│   ├── cert-manager
│   ├── ingress-nginx
│   └── external-dns
├── platform (sync-wave: 1)
│   ├── monitoring
│   ├── logging
│   └── security
└── applications (sync-wave: 2)
    ├── app1
    ├── app2
    └── app3

同步波和钩子

排序的同步波

apiVersion: v1
kind: Namespace
metadata:
  name: myapp
  annotations:
    # 较低数字先同步
    argocd.argoproj.io/sync-wave: "0"
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: myapp-config
  namespace: myapp
  annotations:
    argocd.argoproj.io/sync-wave: "1"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: myapp
  annotations:
    argocd.argoproj.io/sync-wave: "2"
---
apiVersion: v1
kind: Service
metadata:
  name: myapp
  namespace: myapp
  annotations:
    argocd.argoproj.io/sync-wave: "3"

资源钩子

apiVersion: batch/v1
kind: Job
metadata:
  name: db-migration
  namespace: myapp
  annotations:
    # 钩子类型:PreSync, Sync, PostSync, SyncFail, Skip
    argocd.argoproj.io/hook: PreSync

    # 钩子删除策略
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
    # 选项:HookSucceeded, HookFailed, BeforeHookCreation

    # 钩子的同步波
    argocd.argoproj.io/sync-wave: "1"
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: myapp:migrations
          command: ["./migrate.sh"]
      restartPolicy: Never
  backoffLimit: 3
---
apiVersion: batch/v1
kind: Job
metadata:
  name: smoke-test
  namespace: myapp
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
        - name: test
          image: myapp:tests
          command: ["./smoke-test.sh"]
      restartPolicy: Never

健康检查和资源自定义

自定义健康检查

# argocd 命名空间中的 ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-cm
  namespace: argocd
data:
  # 自定义 CRDs 的健康检查
  resource.customizations.health.argoproj.io_Application: |
    hs = {}
    hs.status = "Progressing"
    hs.message = ""
    if obj.status ~= nil then
      if obj.status.health ~= nil then
        hs.status = obj.status.health.status
        if obj.status.health.message ~= nil then
          hs.message = obj.status.health.message
        end
      end
    end
    return hs

  # 自定义证书的健康检查
  resource.customizations.health.cert-manager.io_Certificate: |
    hs = {}
    if obj.status ~= nil then
      if obj.status.conditions ~= nil then
        for i, condition in ipairs(obj.status.conditions) do
          if condition.type == "Ready" and condition.status == "False" then
            hs.status = "Degraded"
            hs.message = condition.message
            return hs
          end
          if condition.type == "Ready" and condition.status == "True" then
            hs.status = "Healthy"
            hs.message = condition.message
            return hs
          end
        end
      end
    end
    hs.status = "Progressing"
    hs.message = "等待证书"
    return hs

资源忽略

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-cm
  namespace: argocd
data:
  # 忽略特定字段的差异
  resource.customizations.ignoreDifferences.apps_Deployment: |
    jsonPointers:
      - /spec/replicas
    jqPathExpressions:
      - .spec.template.spec.containers[].env[] | select(.name == "DYNAMIC_VAR")

  # 忽略所有资源的差异
  resource.customizations.ignoreDifferences.all: |
    managedFieldsManagers:
      - kube-controller-manager

已知类型配置

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-cm
  namespace: argocd
data:
  # 资源跟踪方法
  application.resourceTrackingMethod: annotation+label

  # 从同步中排除资源
  resource.exclusions: |
    - apiGroups:
      - "*"
      kinds:
      - ProviderConfigUsage
      clusters:
      - "*"

RBAC 配置

带 RBAC 的 AppProject

apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: team-a
  namespace: argocd
spec:
  description: 团队 A 项目

  # 源仓库
  sourceRepos:
    - "https://github.com/team-a/*"
    - "https://charts.team-a.com"

  # 目标集群和命名空间
  destinations:
    - namespace: "team-a-*"
      server: https://kubernetes.default.svc
    - namespace: team-a-shared
      server: https://prod-cluster.example.com

  # 集群资源白名单(可以部署什么)
  clusterResourceWhitelist:
    - group: ""
      kind: Namespace
    - group: "rbac.authorization.k8s.io"
      kind: ClusterRole

  # 命名空间资源黑名单(不能部署什么)
  namespaceResourceBlacklist:
    - group: ""
      kind: ResourceQuota
    - group: ""
      kind: LimitRange

  # 项目的角色
  roles:
    - name: developer
      description: 开发者角色
      policies:
        - p, proj:team-a:developer, applications, get, team-a/*, allow
        - p, proj:team-a:developer, applications, sync, team-a/*, allow
      groups:
        - team-a-developers

    - name: admin
      description: 管理员角色
      policies:
        - p, proj:team-a:admin, applications, *, team-a/*, allow
        - p, proj:team-a:admin, repositories, *, team-a/*, allow
      groups:
        - team-a-admins

  # 孤儿资源监控
  orphanedResources:
    warn: true
    ignore:
      - group: ""
        kind: ConfigMap
        name: ignore-this-cm

全局 RBAC 策略

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-rbac-cm
  namespace: argocd
data:
  policy.default: role:readonly

  policy.csv: |
    # 格式:p, subject, resource, action, object, effect

    # 授予组管理员角色
    g, platform-team, role:admin

    # 自定义角色:应用程序部署者
    p, role:app-deployer, applications, get, */*, allow
    p, role:app-deployer, applications, sync, */*, allow
    p, role:app-deployer, applications, override, */*, allow
    p, role:app-deployer, repositories, get, *, allow

    # 授予应用程序部署者角色给组
    g, deployer-team, role:app-deployer

    # 特定权限
    p, user:jane@example.com, applications, *, default/*, allow
    p, user:john@example.com, clusters, get, https://prod-cluster, allow

    # 项目范围的权限
    p, role:project-viewer, applications, get, */*, allow
    p, role:project-viewer, applications, sync, */*, deny

  scopes: "[groups, email]"

同步策略和策略

自动化同步

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: auto-sync-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/myrepo.git
    targetRevision: HEAD
    path: apps/myapp
  destination:
    server: https://kubernetes.default.svc
    namespace: myapp

  syncPolicy:
    automated:
      # 当 Git 更改时自动同步
      prune: true # 删除 Git 中删除的资源
      selfHeal: true # 恢复对集群的手动更改
      allowEmpty: false # 如果路径为空,防止同步

    syncOptions:
      # 如果缺失,创建命名空间
      - CreateNamespace=true

      # 同步前验证资源
      - Validate=true

      # 使用服务器端应用(kubectl apply --server-side)
      - ServerSideApply=true

      # 在前景修剪资源
      - PrunePropagationPolicy=foreground

      # 最后修剪资源(新资源创建后)
      - PruneLast=true

      # 替换资源而不是应用
      - Replace=false

      # 尊重忽略差异
      - RespectIgnoreDifferences=true

    # 重试策略
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

带选择性资源的手动同步

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: manual-sync-app
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/myrepo.git
    targetRevision: HEAD
    path: apps/myapp
  destination:
    server: https://kubernetes.default.svc
    namespace: myapp

  # 无自动化同步策略 - 仅手动
  syncPolicy:
    syncOptions:
      - CreateNamespace=true
      - PruneLast=true

  # 忽略特定资源的差异
  ignoreDifferences:
    - group: apps
      kind: Deployment
      jsonPointers:
        - /spec/replicas
    - group: ""
      kind: Service
      managedFieldsManagers:
        - kube-controller-manager

秘密管理

密封秘密集成

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp-with-sealed-secrets
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/myrepo.git
    targetRevision: HEAD
    path: apps/myapp
    # 密封秘密存储在 Git 中
    # SealedSecret CRD 自动由控制器解密
  destination:
    server: https://kubernetes.default.svc
    namespace: myapp
  syncPolicy:
    automated:
      prune: true
      selfHeal: true

外部秘密运算符

# Git 仓库中的 ExternalSecret
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: myapp-secrets
  namespace: myapp
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager
    kind: SecretStore
  target:
    name: myapp-secret
    creationPolicy: Owner
  data:
    - secretKey: db-password
      remoteRef:
        key: myapp/production/db
        property: password

ArgoCD 保险库插件

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp-vault
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/myrepo.git
    targetRevision: HEAD
    path: apps/myapp
    plugin:
      name: argocd-vault-plugin
      env:
        - name: AVP_TYPE
          value: vault
        - name: AVP_AUTH_TYPE
          value: k8s
        - name: AVP_K8S_ROLE
          value: argocd
  destination:
    server: https://kubernetes.default.svc
    namespace: myapp

Helm 值中的秘密(加密)

# 使用 SOPS 或 git-crypt 加密值文件
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp-helm-secrets
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/charts.git
    targetRevision: HEAD
    path: charts/myapp
    helm:
      valueFiles:
        - values.yaml
        # 使用 SOPS 加密,由插件解密
        - secrets://values-secrets.yaml
  destination:
    server: https://kubernetes.default.svc
    namespace: myapp

多租户最佳实践

使用 AppProjects 的租户隔离

apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
  name: tenant-alpha
  namespace: argocd
spec:
  description: 租户 Alpha 隔离项目

  sourceRepos:
    - "https://github.com/tenant-alpha/*"

  destinations:
    - namespace: "tenant-alpha-*"
      server: https://kubernetes.default.svc

  clusterResourceWhitelist:
    - group: ""
      kind: Namespace

  namespaceResourceWhitelist:
    - group: "*"
      kind: "*"

  namespaceResourceBlacklist:
    - group: ""
      kind: ResourceQuota
    - group: ""
      kind: LimitRange
    - group: "rbac.authorization.k8s.io"
      kind: "*"

  roles:
    - name: tenant-admin
      policies:
        - p, proj:tenant-alpha:tenant-admin, applications, *, tenant-alpha/*, allow
      groups:
        - tenant-alpha-admins

每个租户的资源配额

apiVersion: v1
kind: ResourceQuota
metadata:
  name: tenant-alpha-quota
  namespace: tenant-alpha-prod
spec:
  hard:
    requests.cpu: "100"
    requests.memory: 200Gi
    limits.cpu: "200"
    limits.memory: 400Gi
    persistentvolumeclaims: "10"
    services.loadbalancers: "5"

租户隔离的网络策略

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: tenant-isolation
  namespace: tenant-alpha-prod
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress
  ingress:
    # 允许来自同一命名空间
    - from:
        - podSelector: {}
    # 允许来自 ingress 控制器
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
  egress:
    # 允许到同一命名空间
    - to:
        - podSelector: {}
    # 允许 DNS
    - to:
        - namespaceSelector:
            matchLabels:
              name: kube-system
      ports:
        - protocol: UDP
          port: 53

渐进式交付

Argo Rollouts 集成

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp-rollout
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/myrepo.git
    targetRevision: HEAD
    path: apps/myapp-rollout
  destination:
    server: https://kubernetes.default.svc
    namespace: myapp
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
---
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: myapp
  namespace: myapp
spec:
  replicas: 5
  strategy:
    canary:
      steps:
        - setWeight: 20
        - pause: { duration: 10m }
        - setWeight: 40
        - pause: { duration: 10m }
        - setWeight: 60
        - pause: { duration: 10m }
        - setWeight: 80
        - pause: { duration: 10m }
      analysis:
        templates:
          - templateName: success-rate
        startingStep: 2
      trafficRouting:
        istio:
          virtualService:
            name: myapp-vsvc
            routes:
              - primary
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
        - name: myapp
          image: myapp:stable

监控和可观测性

Prometheus 指标

apiVersion: v1
kind: Service
metadata:
  name: argocd-metrics
  namespace: argocd
  labels:
    app.kubernetes.io/name: argocd-metrics
spec:
  ports:
    - name: metrics
      port: 8082
      protocol: TCP
      targetPort: 8082
  selector:
    app.kubernetes.io/name: argocd-server
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: argocd-metrics
  namespace: argocd
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: argocd-metrics
  endpoints:
    - port: metrics
      interval: 30s

通知模板

apiVersion: v1
kind: ConfigMap
metadata:
  name: argocd-notifications-cm
  namespace: argocd
data:
  service.slack: |
    token: $slack-token

  template.app-deployed: |
    message: |
      应用程序 {{.app.metadata.name}} 现在运行新版本。
    slack:
      attachments: |
        [{
          "title": "{{ .app.metadata.name}}",
          "title_link":"{{.context.argocdUrl}}/applications/{{.app.metadata.name}}",
          "color": "#18be52",
          "fields": [
          {
            "title": "同步状态",
            "value": "{{.app.status.sync.status}}",
            "short": true
          },
          {
            "title": "仓库",
            "value": "{{.app.spec.source.repoURL}}",
            "short": true
          }
          ]
        }]

  template.app-health-degraded: |
    message: |
      应用程序 {{.app.metadata.name}} 健康状态下降。
    slack:
      attachments: |
        [{
          "title": "{{ .app.metadata.name}}",
          "title_link": "{{.context.argocdUrl}}/applications/{{.app.metadata.name}}",
          "color": "#f4c030",
          "fields": [
          {
            "title": "健康状态",
            "value": "{{.app.status.health.status}}",
            "short": true
          },
          {
            "title": "消息",
            "value": "{{.app.status.health.message}}",
            "short": false
          }
          ]
        }]

  trigger.on-deployed: |
    - when: app.status.operationState.phase in ['Succeeded']
      send: [app-deployed]

  trigger.on-health-degraded: |
    - when: app.status.health.status == 'Degraded'
      send: [app-health-degraded]

用于通知的应用程序注解

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: myapp
  namespace: argocd
  annotations:
    notifications.argoproj.io/subscribe.on-deployed.slack: my-channel
    notifications.argoproj.io/subscribe.on-health-degraded.slack: alerts-channel
spec:
  project: default
  source:
    repoURL: https://github.com/myorg/myrepo.git
    targetRevision: HEAD
    path: apps/myapp
  destination:
    server: https://kubernetes.default.svc
    namespace: myapp

CLI 操作

应用程序管理

# 创建应用程序
argocd app create myapp \
  --repo https://github.com/myorg/myrepo.git \
  --path apps/myapp \
  --dest-server https://kubernetes.default.svc \
  --dest-namespace myapp \
  --sync-policy automated \
  --auto-prune \
  --self-heal

# 列出应用程序
argocd app list

# 获取应用程序详情
argocd app get myapp

# 同步应用程序
argocd app sync myapp

# 同步特定资源
argocd app sync myapp --resource apps:Deployment:myapp

# 回滚到上一个版本
argocd app rollback myapp

# 删除应用程序
argocd app delete myapp

# 删除应用程序并级联删除资源
argocd app delete myapp --cascade

# 比较本地更改
argocd app diff myapp

# 等待同步完成
argocd app wait myapp --health

# 设置应用程序参数
argocd app set myapp --helm-set replicaCount=3

仓库管理

# 添加仓库
argocd repo add https://github.com/myorg/myrepo.git \
  --username myuser \
  --password mytoken

# 使用 SSH 添加私有仓库
argocd repo add git@github.com:myorg/myrepo.git \
  --ssh-private-key-path ~/.ssh/id_rsa

# 列出仓库
argocd repo list

# 移除仓库
argocd repo rm https://github.com/myorg/myrepo.git

集群管理

# 添加集群
argocd cluster add my-cluster-context

# 列出集群
argocd cluster list

# 移除集群
argocd cluster rm https://my-cluster.example.com

项目管理

# 创建项目
argocd proj create myproject \
  --description "我的项目" \
  --src https://github.com/myorg/* \
  --dest https://kubernetes.default.svc,myapp-*

# 添加角色到项目
argocd proj role create myproject developer

# 添加策略到角色
argocd proj role add-policy myproject developer \
  --action get --permission allow \
  --object 'applications'

# 列出项目
argocd proj list

# 获取项目详情
argocd proj get myproject

最佳实践

仓库组织

  1. 分离配置和代码:将应用程序代码和 Kubernetes 清单保存在单独的仓库中
  2. 环境分支或目录:使用每个环境分支或每个环境目录策略
  3. 不可变标签:对生产部署使用 Git 提交 SHA 或不可变标签
  4. 基于 PR 的部署:要求拉取请求以更改生产清单

应用程序设计

  1. 每个微服务一个应用程序:为每个微服务创建单独的 Argo CD 应用程序
  2. 使用 AppProjects:分组相关应用程序并执行 RBAC 边界
  3. 实现同步波:使用同步波和钩子排序资源创建
  4. 健康检查:为 CRDs 和自定义资源定义自定义健康检查
  5. 资源限制:始终定义资源请求和限制

安全

  1. 最少权限 RBAC:授予每个团队/项目的最小必要权限
  2. 加密秘密:永远不要将明文秘密提交到 Git
  3. 分离凭证:为不同环境使用不同的 Git 凭证
  4. 审计日志:启用并监控 Argo CD 审计日志
  5. 网络策略:限制对 Argo CD 组件的网络访问

同步策略

  1. 非生产环境的自动化同步:为开发/预生产启用自动同步和自愈
  2. 生产环境的手动同步:要求生产同步的手动批准
  3. 谨慎修剪:小心使用 prune: true,考虑 PruneLast 选项
  4. 同步窗口:配置同步窗口以防止在营业时间内部署
  5. 渐进式部署:使用 Argo Rollouts 进行金丝雀和蓝绿部署

多集群管理

  1. 集群命名:为集群使用一致的命名约定
  2. 集群标签:按环境、区域、目的标记集群
  3. ApplicationSets:使用 ApplicationSets 管理跨集群的应用程序
  4. 集群秘密:定期轮换集群凭证
  5. 灾难恢复:在 Git 中维护 Argo CD 配置以便轻松恢复

可观测性

  1. 指标:导出 Prometheus 指标并创建仪表板
  2. 通知:配置同步失败和健康下降的通知
  3. 日志:集中 Argo CD 日志以进行故障排除
  4. 跟踪:为复杂部署启用分布式跟踪
  5. 警报:为不同步的应用程序设置警报

性能

  1. 资源限制:为 Argo CD 组件设置适当的资源限制
  2. 分片:对于大规模部署(1000+ 应用程序)使用控制器分片
  3. 缓存优化:配置 Redis 以提高性能
  4. 基于 Webhook 的同步:使用 Git webhooks 代替轮询以更快同步
  5. 选择性同步:使用资源包含/排除以减少同步范围

灾难恢复

  1. 备份配置:将所有 Argo CD 配置存储在 Git 中
  2. 多个 Argo CD 实例:为不同环境运行单独的实例
  3. 导出应用程序:定期导出应用程序定义
  4. 记录程序:维护灾难恢复的运行手册
  5. 测试恢复:定期测试灾难恢复程序

故障排除

常见问题

应用程序卡在进展状态

# 检查应用程序状态
argocd app get myapp

# 检查同步状态和健康
kubectl get application myapp -n argocd -o yaml

# 手动同步并替换
argocd app sync myapp --replace

尽管没有更改但不同步

# 硬刷新
argocd app get myapp --hard-refresh

# 检查忽略的差异
argocd app diff myapp

权限被拒绝错误

# 检查项目权限
argocd proj get myproject

# 验证 RBAC 策略
kubectl get cm argocd-rbac-cm -n argocd -o yaml

同步因验证错误失败

# 跳过验证
argocd app sync myapp --validate=false

# 或添加到 syncOptions
syncOptions:
  - Validate=false

调试命令

# 启用调试日志
argocd app sync myapp --loglevel debug

# 获取应用程序事件
kubectl get events -n argocd --field-selector involvedObject.name=myapp

# 检查控制器日志
kubectl logs -n argocd deployment/argocd-application-controller

# 检查服务器日志
kubectl logs -n argocd deployment/argocd-server

# 获取资源详情
argocd app resources myapp

参考资料