name: argocd description: 使用 Argo CD 进行 Kubernetes 部署的 GitOps 连续交付。在实现声明式 GitOps 工作流、应用程序同步/回滚、多集群部署、渐进式交付或 CD 自动化时使用。触发器:argocd, argo cd, gitops, application, sync, rollback, app of apps, applicationset, declarative, continuous delivery, CD, deployment automation, kubernetes deployment, multi-cluster, canary deployment, blue-green。 allowed-tools: Read, Grep, Glob, Edit, Write, Bash
Argo CD GitOps 连续交付
概述
Argo CD 是一个声明式的 GitOps 连续交付工具,专为 Kubernetes 设计,用于自动化应用程序部署和生命周期管理。它遵循 GitOps 模式,其中 Git 仓库是定义期望应用程序状态的真相源。
核心概念
- 应用程序:由 Git 中清单定义的一组 Kubernetes 资源
- 应用程序源类型:用于定义应用程序的工具/格式(Helm、Kustomize、普通 YAML、Jsonnet)
- 目标状态:Git 中表示的应用程序期望状态
- 实时状态:Kubernetes 中运行的应用程序的实际状态
- 同步状态:实时状态是否与目标状态匹配
- 同步:使实时状态匹配目标状态的过程
- 健康:应用程序资源的健康状态
- 刷新:比较 Git 中的最新代码与实时状态
- 项目:具有 RBAC 策略的应用程序逻辑分组
安装和设置
在 Kubernetes 中安装 Argo CD
# 创建命名空间
kubectl create namespace argocd
# 安装 Argo CD
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
# 使用高可用安装(生产环境)
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/ha/install.yaml
# 访问 UI
kubectl port-forward svc/argocd-server -n argocd 8080:443
# 获取初始管理员密码
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d
# 安装 CLI
brew install argocd # macOS
# 或从 https://github.com/argoproj/argo-cd/releases 下载
初始配置
# 通过 CLI 登录
argocd login localhost:8080
# 更改管理员密码
argocd account update-password
# 注册外部集群
argocd cluster add my-cluster-context
# 添加 Git 仓库
argocd repo add https://github.com/myorg/myrepo.git --username myuser --password mytoken
仓库结构
推荐目录布局
gitops-repo/
├── apps/ # 应用程序定义
│ ├── base/ # 基础应用程序配置
│ │ ├── app1/
│ │ │ ├── kustomization.yaml
│ │ │ └── deployment.yaml
│ │ └── app2/
│ └── overlays/ # 环境特定覆盖
│ ├── dev/
│ │ ├── kustomization.yaml
│ │ └── patches/
│ ├── staging/
│ └── production/
├── charts/ # Helm 图表(如果使用 Helm)
│ └── myapp/
│ ├── Chart.yaml
│ ├── values.yaml
│ └── templates/
├── argocd/ # Argo CD 配置
│ ├── projects/ # AppProjects
│ ├── applications/ # 应用程序清单
│ │ ├── app1.yaml
│ │ └── app2.yaml
│ └── applicationsets/ # ApplicationSets
│ ├── cluster-apps.yaml
│ └── tenant-apps.yaml
└── bootstrap/ # 应用程序的应用程序引导
└── root-app.yaml
分离策略
单仓库
所有环境的单个仓库
- 优点:管理更简单,更容易跟踪更改
- 缺点:所有团队都有访问权限,更难强制分离
每个环境一个仓库
开发/预生产/生产环境分别的仓库
- 优点:更好的安全边界,清晰的升级路径
- 缺点:更多仓库需要管理,重复配置
每个团队一个仓库
每个团队/服务的分别仓库
- 优点:团队自治,清晰的归属
- 缺点:跨团队协调复杂性
应用程序清单
基本应用程序
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp
namespace: argocd
# 终结器确保级联删除
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
# 项目名称(默认为 'default')
project: default
# 源配置
source:
repoURL: https://github.com/myorg/myrepo.git
targetRevision: HEAD
path: apps/production/myapp
# 目标集群和命名空间
destination:
server: https://kubernetes.default.svc
namespace: myapp-production
# 同步策略
syncPolicy:
automated:
prune: true # 删除 Git 中未存在的资源
selfHeal: true # 当集群状态不同时自动同步
allowEmpty: false
syncOptions:
- CreateNamespace=true
- PrunePropagationPolicy=foreground
- PruneLast=true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
使用 Helm 的应用程序
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp-helm
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/charts.git
targetRevision: main
path: charts/myapp
helm:
# Helm 值文件
valueFiles:
- values.yaml
- values-production.yaml
# 内联值(最高优先级)
values: |
replicaCount: 3
image:
tag: v1.2.3
resources:
limits:
cpu: 500m
memory: 512Mi
# 覆盖特定值
parameters:
- name: image.repository
value: myregistry.io/myapp
# 跳过 CRDs 安装
skipCrds: false
# 发布名称
releaseName: myapp
destination:
server: https://kubernetes.default.svc
namespace: myapp
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
使用 Kustomize 的应用程序
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp-kustomize
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/myrepo.git
targetRevision: HEAD
path: apps/overlays/production
kustomize:
# Kustomize 版本
version: v5.0.0
# 名称前缀/后缀
namePrefix: prod-
nameSuffix: -v1
# 要覆盖的镜像
images:
- name: myapp
newName: myregistry.io/myapp
newTag: v1.2.3
# 通用标签
commonLabels:
environment: production
managed-by: argocd
# 通用注解
commonAnnotations:
deployed-by: argocd
# 副本覆盖
replicas:
- name: myapp-deployment
count: 3
destination:
server: https://kubernetes.default.svc
namespace: myapp-production
ApplicationSets
集群生成器
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: cluster-apps
namespace: argocd
spec:
# 为所有注册的集群生成应用程序
generators:
- clusters:
selector:
matchLabels:
env: production
matchExpressions:
- key: region
operator: In
values: [us-east-1, us-west-2]
values:
# 模板中可用的默认值
revision: main
template:
metadata:
name: "{{name}}-myapp"
labels:
cluster: "{{name}}"
spec:
project: default
source:
repoURL: https://github.com/myorg/myrepo.git
targetRevision: "{{values.revision}}"
path: apps/production/myapp
helm:
parameters:
- name: cluster.name
value: "{{name}}"
- name: cluster.region
value: "{{metadata.labels.region}}"
destination:
server: "{{server}}"
namespace: myapp
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
Git 目录生成器
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: git-directory-apps
namespace: argocd
spec:
generators:
- git:
repoURL: https://github.com/myorg/myrepo.git
revision: HEAD
directories:
- path: apps/production/*
- path: apps/production/exclude-this
exclude: true
template:
metadata:
name: "{{path.basename}}"
spec:
project: default
source:
repoURL: https://github.com/myorg/myrepo.git
targetRevision: HEAD
path: "{{path}}"
destination:
server: https://kubernetes.default.svc
namespace: "{{path.basename}}"
syncPolicy:
automated:
prune: true
selfHeal: true
Git 文件生成器
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: git-file-apps
namespace: argocd
spec:
generators:
- git:
repoURL: https://github.com/myorg/myrepo.git
revision: HEAD
files:
- path: apps/*/config.json
template:
metadata:
name: "{{app.name}}"
spec:
project: default
source:
repoURL: https://github.com/myorg/myrepo.git
targetRevision: HEAD
path: "apps/{{app.name}}"
helm:
parameters:
- name: replicaCount
value: "{{app.replicas}}"
- name: environment
value: "{{app.environment}}"
destination:
server: https://kubernetes.default.svc
namespace: "{{app.namespace}}"
矩阵生成器
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: matrix-apps
namespace: argocd
spec:
generators:
# 矩阵组合多个生成器
- matrix:
generators:
# 第一维度:集群
- clusters:
selector:
matchLabels:
env: production
# 第二维度:git 目录
- git:
repoURL: https://github.com/myorg/myrepo.git
revision: HEAD
directories:
- path: apps/*
template:
metadata:
name: "{{path.basename}}-{{name}}"
spec:
project: default
source:
repoURL: https://github.com/myorg/myrepo.git
targetRevision: HEAD
path: "{{path}}"
destination:
server: "{{server}}"
namespace: "{{path.basename}}"
syncPolicy:
automated:
prune: true
selfHeal: true
列表生成器(多租户)
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: tenant-apps
namespace: argocd
spec:
generators:
- list:
elements:
- tenant: team-a
namespace: team-a-prod
repoURL: https://github.com/team-a/apps.git
quota:
cpu: "10"
memory: 20Gi
- tenant: team-b
namespace: team-b-prod
repoURL: https://github.com/team-b/apps.git
quota:
cpu: "20"
memory: 40Gi
template:
metadata:
name: "{{tenant}}-app"
labels:
tenant: "{{tenant}}"
spec:
project: "{{tenant}}"
source:
repoURL: "{{repoURL}}"
targetRevision: main
path: production
destination:
server: https://kubernetes.default.svc
namespace: "{{namespace}}"
syncPolicy:
automated:
prune: true
selfHeal: true
ApplicationSet 模式
何时使用 ApplicationSets
ApplicationSets 使用生成器自动化多个 Argo CD 应用程序的创建和管理。在以下情况下使用:
- 使用相同配置部署到多个集群
- 管理多个租户或团队
- 从 Git 仓库结构发现应用程序
- 实现环境升级策略
生成器选择指南
| 生成器 | 使用案例 | 示例 |
|---|---|---|
| 集群 | 将相同应用程序部署到多个集群 | 多区域部署 |
| Git 目录 | 从仓库目录结构生成应用程序 | 具有每个目录应用程序的单仓库 |
| Git 文件 | 从 Git 中的配置文件生成应用程序 | 每个应用程序的 JSON/YAML 配置 |
| 列表 | 静态参数列表 | 租户定义 |
| 矩阵 | 组合多个生成器 | 跨集群和环境的应用程序 |
| 拉取请求 | 每个 PR 的预览环境 | 临时测试环境 |
| SCM 提供商 | 从 GitHub/GitLab 发现仓库 | 组织范围的应用程序发现 |
使用 Git 目录的多环境
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: multi-env-apps
namespace: argocd
spec:
generators:
- matrix:
generators:
# 第一:从目录结构发现应用程序
- git:
repoURL: https://github.com/myorg/apps.git
revision: HEAD
directories:
- path: apps/*
# 第二:应用到多个环境
- list:
elements:
- env: dev
cluster: https://dev-cluster.example.com
replicas: "1"
- env: staging
cluster: https://staging-cluster.example.com
replicas: "2"
- env: production
cluster: https://prod-cluster.example.com
replicas: "3"
template:
metadata:
name: "{{path.basename}}-{{env}}"
labels:
app: "{{path.basename}}"
env: "{{env}}"
spec:
project: default
source:
repoURL: https://github.com/myorg/apps.git
targetRevision: HEAD
path: "{{path}}"
helm:
parameters:
- name: environment
value: "{{env}}"
- name: replicaCount
value: "{{replicas}}"
destination:
server: "{{cluster}}"
namespace: "{{path.basename}}-{{env}}"
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
拉取请求预览环境
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: pr-preview
namespace: argocd
spec:
generators:
- pullRequest:
github:
owner: myorg
repo: myapp
tokenRef:
secretName: github-token
key: token
labels:
- preview
requeueAfterSeconds: 60
template:
metadata:
name: "myapp-pr-{{number}}"
labels:
preview: "true"
pr: "{{number}}"
spec:
project: default
source:
repoURL: https://github.com/myorg/myapp.git
targetRevision: "{{head_sha}}"
path: k8s/overlays/preview
kustomize:
commonLabels:
pr: "{{number}}"
images:
- name: myapp
newTag: "pr-{{number}}"
destination:
server: https://kubernetes.default.svc
namespace: "myapp-pr-{{number}}"
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
SCM 提供商发现
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: org-repos
namespace: argocd
spec:
generators:
- scmProvider:
github:
organization: myorg
tokenRef:
secretName: github-token
key: token
filters:
- repositoryMatch: ".*-service$"
- pathsExist: [k8s/production]
template:
metadata:
name: "{{repository}}"
spec:
project: default
source:
repoURL: "{{url}}"
targetRevision: main
path: k8s/production
destination:
server: https://kubernetes.default.svc
namespace: "{{repository}}"
syncPolicy:
automated:
prune: true
selfHeal: true
同步策略
策略选择指南
| 策略 | 使用案例 | 风险 | 自动化 |
|---|---|---|---|
| 自动化 + 自愈 | 非生产环境 | 低 | 完整 |
| 自动化(无自愈) | 需要手动干预的预生产 | 中 | 部分 |
| 手动 | 生产部署 | 高 | 无 |
| 同步窗口 | 业务小时限制 | 中 | 计划 |
| 渐进式(滚动) | 逐步生产部署 | 低 | 条件 |
带条件的自动化同步
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: conditional-sync
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/myrepo.git
targetRevision: HEAD
path: apps/myapp
destination:
server: https://kubernetes.default.svc
namespace: myapp
syncPolicy:
automated:
prune: true
selfHeal: true
allowEmpty: false
syncOptions:
- CreateNamespace=true
- PruneLast=true
- PrunePropagationPolicy=foreground
- RespectIgnoreDifferences=true
- ApplyOutOfSyncOnly=true
# 使用指数退避重试
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
# 忽略对特定字段的手动更改
ignoreDifferences:
- group: apps
kind: Deployment
jsonPointers:
- /spec/replicas
- group: ""
kind: Service
jqPathExpressions:
- .spec.ports[] | select(.nodePort != null) | .nodePort
同步窗口(基于时间的部署控制)
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: production
namespace: argocd
spec:
description: 具有同步窗口的生产项目
sourceRepos:
- "*"
destinations:
- namespace: "*"
server: https://prod-cluster.example.com
# 定义同步窗口
syncWindows:
# 允许在营业时间内同步(周一至周五 UTC 9am-5pm)
- kind: allow
schedule: "0 9 * * 1-5"
duration: 8h
applications:
- "*"
namespaces:
- production-*
clusters:
- https://prod-cluster.example.com
# 在高峰流量期间阻止同步(每日 UTC 12pm-2pm)
- kind: deny
schedule: "0 12 * * *"
duration: 2h
applications:
- "*"
# 紧急同步窗口(需要手动覆盖)
- kind: allow
schedule: "* * * * *"
duration: 1h
manualSync: true
applications:
- critical-app
选择性同步(资源级控制)
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: selective-sync
namespace: argocd
annotations:
# 仅同步特定资源类型
argocd.argoproj.io/sync-options: Prune=false
spec:
project: default
source:
repoURL: https://github.com/myorg/myrepo.git
targetRevision: HEAD
path: apps/myapp
destination:
server: https://kubernetes.default.svc
namespace: myapp
# 忽略同步中的特定资源
ignoreDifferences:
- group: "*"
kind: Secret
name: external-secret
jsonPointers:
- /data
syncPolicy:
syncOptions:
- CreateNamespace=true
# 仅修剪特定资源类型
- PruneResourcesOnDeletion=true
蓝绿同步策略
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp-blue-green
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/myrepo.git
targetRevision: HEAD
path: apps/myapp
destination:
server: https://kubernetes.default.svc
namespace: myapp
syncPolicy:
# 生产环境的手动同步
syncOptions:
- CreateNamespace=true
# 使用同步波进行蓝绿部署
syncWaves:
- wave: 0 # 部署新版本(绿色)
- wave: 1 # 运行冒烟测试
- wave: 2 # 切换流量
- wave: 3 # 移除旧版本(蓝色)
回滚程序
自动回滚策略
应用程序级回滚
# 查看同步历史
argocd app history myapp
# 回滚到上一次同步
argocd app rollback myapp
# 回滚到特定修订版
argocd app rollback myapp 5
# 带修剪的回滚
argocd app rollback myapp 5 --prune
基于 Git 的回滚(推荐)
# 恢复 Git 提交
git revert HEAD
git push origin main
# Argo CD 自动同步恢复
# 这会在 Git 中维护完整的审计轨迹
使用 Argo Rollouts 回滚
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: myapp
namespace: myapp
spec:
replicas: 5
strategy:
blueGreen:
activeService: myapp-active
previewService: myapp-preview
autoPromotionEnabled: false
autoPromotionSeconds: 30
scaleDownDelaySeconds: 300
scaleDownDelayRevisionLimit: 1
# 指标失败时自动回滚
antiAffinity:
requiredDuringSchedulingIgnoredDuringExecution: {}
revisionHistoryLimit: 5
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:stable
---
# 手动回滚命令
# kubectl argo rollouts abort myapp
# kubectl argo rollouts undo myapp
# kubectl argo rollouts retry myapp
健康检查失败时的回滚
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: auto-rollback-app
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/myrepo.git
targetRevision: HEAD
path: apps/myapp
destination:
server: https://kubernetes.default.svc
namespace: myapp
syncPolicy:
automated:
prune: true
selfHeal: false # 禁用自愈以控制手动回滚
# 失败时重试同步
retry:
limit: 3
backoff:
duration: 10s
factor: 2
maxDuration: 1m
# 触发回滚的自定义健康检查
syncOptions:
- Validate=true
- FailOnSharedResource=false
# 使用 PreSync 钩子备份当前状态
---
apiVersion: batch/v1
kind: Job
metadata:
name: pre-sync-backup
namespace: myapp
annotations:
argocd.argoproj.io/hook: PreSync
argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
spec:
template:
spec:
containers:
- name: backup
image: kubectl:latest
command:
- /bin/sh
- -c
- |
kubectl get all -n myapp -o yaml > /backup/previous-state.yaml
restartPolicy: Never
---
# 使用 SyncFail 钩子进行自动回滚
apiVersion: batch/v1
kind: Job
metadata:
name: rollback-on-fail
namespace: myapp
annotations:
argocd.argoproj.io/hook: SyncFail
argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
spec:
template:
spec:
serviceAccountName: argocd-rollback
containers:
- name: rollback
image: argoproj/argocd:latest
command:
- /bin/sh
- -c
- |
argocd app rollback myapp --auth-token $ARGOCD_TOKEN
restartPolicy: Never
紧急回滚运行手册
# 1. 检查应用程序状态
argocd app get myapp
argocd app history myapp
# 2. 识别最后一个已知良好修订版
argocd app history myapp | grep Succeeded
# 3. 快速回滚到上一个修订版
argocd app rollback myapp
# 4. 如果回滚失败,强制同步并替换
argocd app sync myapp --force --replace --prune
# 5. 如果仍然失败,恢复 Git 并强制同步
cd gitops-repo
git revert HEAD --no-commit
git commit -m "紧急回滚"
git push origin main
argocd app sync myapp --force
# 6. 如果需要,手动资源清理
kubectl delete deployment myapp -n myapp
argocd app sync myapp --force
# 7. 验证健康和同步状态
argocd app wait myapp --health --timeout 300
# 8. 记录事件
echo "回滚完成于 $(date)" >> /var/log/incidents/myapp-rollback.log
回滚测试(预生产)
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: rollback-test
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/myrepo.git
targetRevision: HEAD
path: apps/myapp
destination:
server: https://kubernetes.default.svc
namespace: myapp-test
syncPolicy:
automated:
prune: true
selfHeal: true
# PostSync 钩子测试回滚能力
---
apiVersion: batch/v1
kind: Job
metadata:
name: test-rollback
namespace: myapp-test
annotations:
argocd.argoproj.io/hook: PostSync
argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
spec:
template:
spec:
serviceAccountName: argocd-test
containers:
- name: test
image: argoproj/argocd:latest
command:
- /bin/sh
- -c
- |
# 测试应用程序健康
argocd app wait rollback-test --health --timeout 60
# 执行回滚测试
argocd app rollback rollback-test
# 验证回滚成功
argocd app wait rollback-test --health --timeout 60
# 重新同步到最新
argocd app sync rollback-test
restartPolicy: Never
应用程序的应用程序模式
根应用程序(引导)
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: root-app
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://github.com/myorg/gitops.git
targetRevision: HEAD
path: argocd/applications
directory:
recurse: true
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
基础设施应用程序
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: infrastructure
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/gitops.git
targetRevision: HEAD
path: argocd/infrastructure
directory:
recurse: true
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
分层应用程序的应用程序
root-app
├── infrastructure (sync-wave: 0)
│ ├── cert-manager
│ ├── ingress-nginx
│ └── external-dns
├── platform (sync-wave: 1)
│ ├── monitoring
│ ├── logging
│ └── security
└── applications (sync-wave: 2)
├── app1
├── app2
└── app3
同步波和钩子
排序的同步波
apiVersion: v1
kind: Namespace
metadata:
name: myapp
annotations:
# 较低数字先同步
argocd.argoproj.io/sync-wave: "0"
---
apiVersion: v1
kind: ConfigMap
metadata:
name: myapp-config
namespace: myapp
annotations:
argocd.argoproj.io/sync-wave: "1"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
namespace: myapp
annotations:
argocd.argoproj.io/sync-wave: "2"
---
apiVersion: v1
kind: Service
metadata:
name: myapp
namespace: myapp
annotations:
argocd.argoproj.io/sync-wave: "3"
资源钩子
apiVersion: batch/v1
kind: Job
metadata:
name: db-migration
namespace: myapp
annotations:
# 钩子类型:PreSync, Sync, PostSync, SyncFail, Skip
argocd.argoproj.io/hook: PreSync
# 钩子删除策略
argocd.argoproj.io/hook-delete-policy: HookSucceeded
# 选项:HookSucceeded, HookFailed, BeforeHookCreation
# 钩子的同步波
argocd.argoproj.io/sync-wave: "1"
spec:
template:
spec:
containers:
- name: migrate
image: myapp:migrations
command: ["./migrate.sh"]
restartPolicy: Never
backoffLimit: 3
---
apiVersion: batch/v1
kind: Job
metadata:
name: smoke-test
namespace: myapp
annotations:
argocd.argoproj.io/hook: PostSync
argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
template:
spec:
containers:
- name: test
image: myapp:tests
command: ["./smoke-test.sh"]
restartPolicy: Never
健康检查和资源自定义
自定义健康检查
# argocd 命名空间中的 ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-cm
namespace: argocd
data:
# 自定义 CRDs 的健康检查
resource.customizations.health.argoproj.io_Application: |
hs = {}
hs.status = "Progressing"
hs.message = ""
if obj.status ~= nil then
if obj.status.health ~= nil then
hs.status = obj.status.health.status
if obj.status.health.message ~= nil then
hs.message = obj.status.health.message
end
end
end
return hs
# 自定义证书的健康检查
resource.customizations.health.cert-manager.io_Certificate: |
hs = {}
if obj.status ~= nil then
if obj.status.conditions ~= nil then
for i, condition in ipairs(obj.status.conditions) do
if condition.type == "Ready" and condition.status == "False" then
hs.status = "Degraded"
hs.message = condition.message
return hs
end
if condition.type == "Ready" and condition.status == "True" then
hs.status = "Healthy"
hs.message = condition.message
return hs
end
end
end
end
hs.status = "Progressing"
hs.message = "等待证书"
return hs
资源忽略
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-cm
namespace: argocd
data:
# 忽略特定字段的差异
resource.customizations.ignoreDifferences.apps_Deployment: |
jsonPointers:
- /spec/replicas
jqPathExpressions:
- .spec.template.spec.containers[].env[] | select(.name == "DYNAMIC_VAR")
# 忽略所有资源的差异
resource.customizations.ignoreDifferences.all: |
managedFieldsManagers:
- kube-controller-manager
已知类型配置
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-cm
namespace: argocd
data:
# 资源跟踪方法
application.resourceTrackingMethod: annotation+label
# 从同步中排除资源
resource.exclusions: |
- apiGroups:
- "*"
kinds:
- ProviderConfigUsage
clusters:
- "*"
RBAC 配置
带 RBAC 的 AppProject
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: team-a
namespace: argocd
spec:
description: 团队 A 项目
# 源仓库
sourceRepos:
- "https://github.com/team-a/*"
- "https://charts.team-a.com"
# 目标集群和命名空间
destinations:
- namespace: "team-a-*"
server: https://kubernetes.default.svc
- namespace: team-a-shared
server: https://prod-cluster.example.com
# 集群资源白名单(可以部署什么)
clusterResourceWhitelist:
- group: ""
kind: Namespace
- group: "rbac.authorization.k8s.io"
kind: ClusterRole
# 命名空间资源黑名单(不能部署什么)
namespaceResourceBlacklist:
- group: ""
kind: ResourceQuota
- group: ""
kind: LimitRange
# 项目的角色
roles:
- name: developer
description: 开发者角色
policies:
- p, proj:team-a:developer, applications, get, team-a/*, allow
- p, proj:team-a:developer, applications, sync, team-a/*, allow
groups:
- team-a-developers
- name: admin
description: 管理员角色
policies:
- p, proj:team-a:admin, applications, *, team-a/*, allow
- p, proj:team-a:admin, repositories, *, team-a/*, allow
groups:
- team-a-admins
# 孤儿资源监控
orphanedResources:
warn: true
ignore:
- group: ""
kind: ConfigMap
name: ignore-this-cm
全局 RBAC 策略
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-rbac-cm
namespace: argocd
data:
policy.default: role:readonly
policy.csv: |
# 格式:p, subject, resource, action, object, effect
# 授予组管理员角色
g, platform-team, role:admin
# 自定义角色:应用程序部署者
p, role:app-deployer, applications, get, */*, allow
p, role:app-deployer, applications, sync, */*, allow
p, role:app-deployer, applications, override, */*, allow
p, role:app-deployer, repositories, get, *, allow
# 授予应用程序部署者角色给组
g, deployer-team, role:app-deployer
# 特定权限
p, user:jane@example.com, applications, *, default/*, allow
p, user:john@example.com, clusters, get, https://prod-cluster, allow
# 项目范围的权限
p, role:project-viewer, applications, get, */*, allow
p, role:project-viewer, applications, sync, */*, deny
scopes: "[groups, email]"
同步策略和策略
自动化同步
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: auto-sync-app
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/myrepo.git
targetRevision: HEAD
path: apps/myapp
destination:
server: https://kubernetes.default.svc
namespace: myapp
syncPolicy:
automated:
# 当 Git 更改时自动同步
prune: true # 删除 Git 中删除的资源
selfHeal: true # 恢复对集群的手动更改
allowEmpty: false # 如果路径为空,防止同步
syncOptions:
# 如果缺失,创建命名空间
- CreateNamespace=true
# 同步前验证资源
- Validate=true
# 使用服务器端应用(kubectl apply --server-side)
- ServerSideApply=true
# 在前景修剪资源
- PrunePropagationPolicy=foreground
# 最后修剪资源(新资源创建后)
- PruneLast=true
# 替换资源而不是应用
- Replace=false
# 尊重忽略差异
- RespectIgnoreDifferences=true
# 重试策略
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
带选择性资源的手动同步
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: manual-sync-app
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/myrepo.git
targetRevision: HEAD
path: apps/myapp
destination:
server: https://kubernetes.default.svc
namespace: myapp
# 无自动化同步策略 - 仅手动
syncPolicy:
syncOptions:
- CreateNamespace=true
- PruneLast=true
# 忽略特定资源的差异
ignoreDifferences:
- group: apps
kind: Deployment
jsonPointers:
- /spec/replicas
- group: ""
kind: Service
managedFieldsManagers:
- kube-controller-manager
秘密管理
密封秘密集成
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp-with-sealed-secrets
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/myrepo.git
targetRevision: HEAD
path: apps/myapp
# 密封秘密存储在 Git 中
# SealedSecret CRD 自动由控制器解密
destination:
server: https://kubernetes.default.svc
namespace: myapp
syncPolicy:
automated:
prune: true
selfHeal: true
外部秘密运算符
# Git 仓库中的 ExternalSecret
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: myapp-secrets
namespace: myapp
spec:
refreshInterval: 1h
secretStoreRef:
name: aws-secrets-manager
kind: SecretStore
target:
name: myapp-secret
creationPolicy: Owner
data:
- secretKey: db-password
remoteRef:
key: myapp/production/db
property: password
ArgoCD 保险库插件
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp-vault
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/myrepo.git
targetRevision: HEAD
path: apps/myapp
plugin:
name: argocd-vault-plugin
env:
- name: AVP_TYPE
value: vault
- name: AVP_AUTH_TYPE
value: k8s
- name: AVP_K8S_ROLE
value: argocd
destination:
server: https://kubernetes.default.svc
namespace: myapp
Helm 值中的秘密(加密)
# 使用 SOPS 或 git-crypt 加密值文件
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp-helm-secrets
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/charts.git
targetRevision: HEAD
path: charts/myapp
helm:
valueFiles:
- values.yaml
# 使用 SOPS 加密,由插件解密
- secrets://values-secrets.yaml
destination:
server: https://kubernetes.default.svc
namespace: myapp
多租户最佳实践
使用 AppProjects 的租户隔离
apiVersion: argoproj.io/v1alpha1
kind: AppProject
metadata:
name: tenant-alpha
namespace: argocd
spec:
description: 租户 Alpha 隔离项目
sourceRepos:
- "https://github.com/tenant-alpha/*"
destinations:
- namespace: "tenant-alpha-*"
server: https://kubernetes.default.svc
clusterResourceWhitelist:
- group: ""
kind: Namespace
namespaceResourceWhitelist:
- group: "*"
kind: "*"
namespaceResourceBlacklist:
- group: ""
kind: ResourceQuota
- group: ""
kind: LimitRange
- group: "rbac.authorization.k8s.io"
kind: "*"
roles:
- name: tenant-admin
policies:
- p, proj:tenant-alpha:tenant-admin, applications, *, tenant-alpha/*, allow
groups:
- tenant-alpha-admins
每个租户的资源配额
apiVersion: v1
kind: ResourceQuota
metadata:
name: tenant-alpha-quota
namespace: tenant-alpha-prod
spec:
hard:
requests.cpu: "100"
requests.memory: 200Gi
limits.cpu: "200"
limits.memory: 400Gi
persistentvolumeclaims: "10"
services.loadbalancers: "5"
租户隔离的网络策略
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: tenant-isolation
namespace: tenant-alpha-prod
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
# 允许来自同一命名空间
- from:
- podSelector: {}
# 允许来自 ingress 控制器
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
egress:
# 允许到同一命名空间
- to:
- podSelector: {}
# 允许 DNS
- to:
- namespaceSelector:
matchLabels:
name: kube-system
ports:
- protocol: UDP
port: 53
渐进式交付
Argo Rollouts 集成
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp-rollout
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/myorg/myrepo.git
targetRevision: HEAD
path: apps/myapp-rollout
destination:
server: https://kubernetes.default.svc
namespace: myapp
syncPolicy:
automated:
prune: true
selfHeal: true
---
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: myapp
namespace: myapp
spec:
replicas: 5
strategy:
canary:
steps:
- setWeight: 20
- pause: { duration: 10m }
- setWeight: 40
- pause: { duration: 10m }
- setWeight: 60
- pause: { duration: 10m }
- setWeight: 80
- pause: { duration: 10m }
analysis:
templates:
- templateName: success-rate
startingStep: 2
trafficRouting:
istio:
virtualService:
name: myapp-vsvc
routes:
- primary
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:stable
监控和可观测性
Prometheus 指标
apiVersion: v1
kind: Service
metadata:
name: argocd-metrics
namespace: argocd
labels:
app.kubernetes.io/name: argocd-metrics
spec:
ports:
- name: metrics
port: 8082
protocol: TCP
targetPort: 8082
selector:
app.kubernetes.io/name: argocd-server
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: argocd-metrics
namespace: argocd
spec:
selector:
matchLabels:
app.kubernetes.io/name: argocd-metrics
endpoints:
- port: metrics
interval: 30s
通知模板
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-notifications-cm
namespace: argocd
data:
service.slack: |
token: $slack-token
template.app-deployed: |
message: |
应用程序 {{.app.metadata.name}} 现在运行新版本。
slack:
attachments: |
[{
"title": "{{ .app.metadata.name}}",
"title_link":"{{.context.argocdUrl}}/applications/{{.app.metadata.name}}",
"color": "#18be52",
"fields": [
{
"title": "同步状态",
"value": "{{.app.status.sync.status}}",
"short": true
},
{
"title": "仓库",
"value": "{{.app.spec.source.repoURL}}",
"short": true
}
]
}]
template.app-health-degraded: |
message: |
应用程序 {{.app.metadata.name}} 健康状态下降。
slack:
attachments: |
[{
"title": "{{ .app.metadata.name}}",
"title_link": "{{.context.argocdUrl}}/applications/{{.app.metadata.name}}",
"color": "#f4c030",
"fields": [
{
"title": "健康状态",
"value": "{{.app.status.health.status}}",
"short": true
},
{
"title": "消息",
"value": "{{.app.status.health.message}}",
"short": false
}
]
}]
trigger.on-deployed: |
- when: app.status.operationState.phase in ['Succeeded']
send: [app-deployed]
trigger.on-health-degraded: |
- when: app.status.health.status == 'Degraded'
send: [app-health-degraded]
用于通知的应用程序注解
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: myapp
namespace: argocd
annotations:
notifications.argoproj.io/subscribe.on-deployed.slack: my-channel
notifications.argoproj.io/subscribe.on-health-degraded.slack: alerts-channel
spec:
project: default
source:
repoURL: https://github.com/myorg/myrepo.git
targetRevision: HEAD
path: apps/myapp
destination:
server: https://kubernetes.default.svc
namespace: myapp
CLI 操作
应用程序管理
# 创建应用程序
argocd app create myapp \
--repo https://github.com/myorg/myrepo.git \
--path apps/myapp \
--dest-server https://kubernetes.default.svc \
--dest-namespace myapp \
--sync-policy automated \
--auto-prune \
--self-heal
# 列出应用程序
argocd app list
# 获取应用程序详情
argocd app get myapp
# 同步应用程序
argocd app sync myapp
# 同步特定资源
argocd app sync myapp --resource apps:Deployment:myapp
# 回滚到上一个版本
argocd app rollback myapp
# 删除应用程序
argocd app delete myapp
# 删除应用程序并级联删除资源
argocd app delete myapp --cascade
# 比较本地更改
argocd app diff myapp
# 等待同步完成
argocd app wait myapp --health
# 设置应用程序参数
argocd app set myapp --helm-set replicaCount=3
仓库管理
# 添加仓库
argocd repo add https://github.com/myorg/myrepo.git \
--username myuser \
--password mytoken
# 使用 SSH 添加私有仓库
argocd repo add git@github.com:myorg/myrepo.git \
--ssh-private-key-path ~/.ssh/id_rsa
# 列出仓库
argocd repo list
# 移除仓库
argocd repo rm https://github.com/myorg/myrepo.git
集群管理
# 添加集群
argocd cluster add my-cluster-context
# 列出集群
argocd cluster list
# 移除集群
argocd cluster rm https://my-cluster.example.com
项目管理
# 创建项目
argocd proj create myproject \
--description "我的项目" \
--src https://github.com/myorg/* \
--dest https://kubernetes.default.svc,myapp-*
# 添加角色到项目
argocd proj role create myproject developer
# 添加策略到角色
argocd proj role add-policy myproject developer \
--action get --permission allow \
--object 'applications'
# 列出项目
argocd proj list
# 获取项目详情
argocd proj get myproject
最佳实践
仓库组织
- 分离配置和代码:将应用程序代码和 Kubernetes 清单保存在单独的仓库中
- 环境分支或目录:使用每个环境分支或每个环境目录策略
- 不可变标签:对生产部署使用 Git 提交 SHA 或不可变标签
- 基于 PR 的部署:要求拉取请求以更改生产清单
应用程序设计
- 每个微服务一个应用程序:为每个微服务创建单独的 Argo CD 应用程序
- 使用 AppProjects:分组相关应用程序并执行 RBAC 边界
- 实现同步波:使用同步波和钩子排序资源创建
- 健康检查:为 CRDs 和自定义资源定义自定义健康检查
- 资源限制:始终定义资源请求和限制
安全
- 最少权限 RBAC:授予每个团队/项目的最小必要权限
- 加密秘密:永远不要将明文秘密提交到 Git
- 分离凭证:为不同环境使用不同的 Git 凭证
- 审计日志:启用并监控 Argo CD 审计日志
- 网络策略:限制对 Argo CD 组件的网络访问
同步策略
- 非生产环境的自动化同步:为开发/预生产启用自动同步和自愈
- 生产环境的手动同步:要求生产同步的手动批准
- 谨慎修剪:小心使用 prune: true,考虑 PruneLast 选项
- 同步窗口:配置同步窗口以防止在营业时间内部署
- 渐进式部署:使用 Argo Rollouts 进行金丝雀和蓝绿部署
多集群管理
- 集群命名:为集群使用一致的命名约定
- 集群标签:按环境、区域、目的标记集群
- ApplicationSets:使用 ApplicationSets 管理跨集群的应用程序
- 集群秘密:定期轮换集群凭证
- 灾难恢复:在 Git 中维护 Argo CD 配置以便轻松恢复
可观测性
- 指标:导出 Prometheus 指标并创建仪表板
- 通知:配置同步失败和健康下降的通知
- 日志:集中 Argo CD 日志以进行故障排除
- 跟踪:为复杂部署启用分布式跟踪
- 警报:为不同步的应用程序设置警报
性能
- 资源限制:为 Argo CD 组件设置适当的资源限制
- 分片:对于大规模部署(1000+ 应用程序)使用控制器分片
- 缓存优化:配置 Redis 以提高性能
- 基于 Webhook 的同步:使用 Git webhooks 代替轮询以更快同步
- 选择性同步:使用资源包含/排除以减少同步范围
灾难恢复
- 备份配置:将所有 Argo CD 配置存储在 Git 中
- 多个 Argo CD 实例:为不同环境运行单独的实例
- 导出应用程序:定期导出应用程序定义
- 记录程序:维护灾难恢复的运行手册
- 测试恢复:定期测试灾难恢复程序
故障排除
常见问题
应用程序卡在进展状态
# 检查应用程序状态
argocd app get myapp
# 检查同步状态和健康
kubectl get application myapp -n argocd -o yaml
# 手动同步并替换
argocd app sync myapp --replace
尽管没有更改但不同步
# 硬刷新
argocd app get myapp --hard-refresh
# 检查忽略的差异
argocd app diff myapp
权限被拒绝错误
# 检查项目权限
argocd proj get myproject
# 验证 RBAC 策略
kubectl get cm argocd-rbac-cm -n argocd -o yaml
同步因验证错误失败
# 跳过验证
argocd app sync myapp --validate=false
# 或添加到 syncOptions
syncOptions:
- Validate=false
调试命令
# 启用调试日志
argocd app sync myapp --loglevel debug
# 获取应用程序事件
kubectl get events -n argocd --field-selector involvedObject.name=myapp
# 检查控制器日志
kubectl logs -n argocd deployment/argocd-application-controller
# 检查服务器日志
kubectl logs -n argocd deployment/argocd-server
# 获取资源详情
argocd app resources myapp