name: 在Kubernetes上部署PostgreSQL description: | 使用CloudNativePG操作符在Kubernetes上部署具有自动故障转移功能的PostgreSQL。 适用于为生产工作负载、高可用性或本地K8s开发设置PostgreSQL。 涵盖操作符安装、集群创建、连接密钥和备份配置。 不适用于使用托管PostgreSQL服务(如Neon、RDS、Cloud SQL)或简单的Docker容器。
在Kubernetes上部署PostgreSQL
使用CloudNativePG操作符(v1.28+)部署生产就绪的PostgreSQL集群,具备自动故障转移功能。
快速开始
# 1. 安装CloudNativePG操作符
kubectl apply --server-side -f \
https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.28/releases/cnpg-1.28.0.yaml
# 2. 等待操作符就绪
kubectl rollout status deployment -n cnpg-system cnpg-controller-manager
# 3. 部署PostgreSQL集群
kubectl apply -f - <<EOF
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: pg-cluster
spec:
instances: 3
storage:
size: 10Gi
EOF
# 4. 等待集群就绪
kubectl wait cluster/pg-cluster --for=condition=Ready --timeout=300s
操作符安装
直接使用清单文件(推荐)
kubectl apply --server-side -f \
https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.28/releases/cnpg-1.28.0.yaml
# 验证
kubectl rollout status deployment -n cnpg-system cnpg-controller-manager
kubectl get pods -n cnpg-system
Helm安装
helm repo add cnpg https://cloudnative-pg.github.io/charts
helm repo update
helm upgrade --install cnpg \
--namespace cnpg-system \
--create-namespace \
cnpg/cloudnative-pg
命名空间范围(增强安全性)
helm upgrade --install cnpg \
--namespace cnpg-system \
--create-namespace \
--set config.clusterWide=false \
cnpg/cloudnative-pg
集群配置
开发环境(单实例)
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: pg-dev
spec:
instances: 1
imageName: ghcr.io/cloudnative-pg/postgresql:17.2
primaryUpdateStrategy: unsupervised
storage:
size: 5Gi
postgresql:
parameters:
max_connections: "100"
shared_buffers: "256MB"
生产环境(3副本高可用)
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: pg-production
spec:
instances: 3
imageName: ghcr.io/cloudnative-pg/postgresql:17.2
primaryUpdateStrategy: unsupervised
storage:
storageClass: standard
size: 100Gi
resources:
requests:
memory: "2Gi"
cpu: "1"
limits:
memory: "4Gi"
cpu: "2"
postgresql:
parameters:
max_connections: "200"
shared_buffers: "1GB"
effective_cache_size: "3GB"
maintenance_work_mem: "256MB"
checkpoint_completion_target: "0.9"
wal_buffers: "16MB"
default_statistics_target: "100"
random_page_cost: "1.1"
effective_io_concurrency: "200"
affinity:
podAntiAffinityType: required # 跨节点分布
monitoring:
enablePodMonitor: true
包含引导数据库
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: pg-cluster
spec:
instances: 3
storage:
size: 10Gi
bootstrap:
initdb:
database: learnflow
owner: app_user
secret:
name: app-user-secret
首先创建密钥:
kubectl create secret generic app-user-secret \
--from-literal=username=app_user \
--from-literal=password=$(openssl rand -hex 16)
连接密钥
CloudNativePG自动创建连接密钥:
| 密钥 | 内容 |
|---|---|
pg-cluster-app |
应用凭证(推荐) |
pg-cluster-superuser |
超级用户凭证 |
获取连接字符串
# 获取应用凭证
kubectl get secret pg-cluster-app -o jsonpath='{.data.uri}' | base64 -d
# 获取超级用户凭证(仅用于管理任务)
kubectl get secret pg-cluster-superuser -o jsonpath='{.data.uri}' | base64 -d
在部署中使用
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: pg-cluster-app
key: uri
服务端点
| 服务 | 端口 | 用途 |
|---|---|---|
pg-cluster-rw |
5432 | 读写(主节点) |
pg-cluster-ro |
5432 | 只读(副本) |
pg-cluster-r |
5432 | 任意实例 |
从另一个命名空间连接
env:
- name: DATABASE_URL
value: "postgresql://app_user:password@pg-cluster-rw.default.svc.cluster.local:5432/learnflow"
数据库操作
使用psql连接
# 使用kubectl cnpg插件(推荐)
kubectl cnpg psql pg-cluster -- -c "SELECT version();"
# 或直接连接
kubectl exec -it pg-cluster-1 -- psql -U postgres
创建数据库和用户
kubectl exec -it pg-cluster-1 -- psql -U postgres <<EOF
CREATE DATABASE myapp;
CREATE USER myapp_user WITH ENCRYPTED PASSWORD 'secure_password';
GRANT ALL PRIVILEGES ON DATABASE myapp TO myapp_user;
\c myapp
GRANT ALL ON SCHEMA public TO myapp_user;
EOF
运行迁移
# 从本地机器
kubectl port-forward svc/pg-cluster-rw 5432:5432 &
DATABASE_URL="postgresql://postgres:password@localhost:5432/learnflow" alembic upgrade head
备份配置
备份到S3
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: pg-cluster
spec:
instances: 3
storage:
size: 10Gi
backup:
barmanObjectStore:
destinationPath: "s3://my-bucket/pg-backups"
s3Credentials:
accessKeyId:
name: s3-creds
key: ACCESS_KEY_ID
secretAccessKey:
name: s3-creds
key: SECRET_ACCESS_KEY
wal:
compression: gzip
data:
compression: gzip
retentionPolicy: "30d"
计划备份
apiVersion: postgresql.cnpg.io/v1
kind: ScheduledBackup
metadata:
name: pg-backup-daily
spec:
schedule: "0 0 * * *" # 每天午夜
backupOwnerReference: cluster
cluster:
name: pg-cluster
监控
检查集群状态
kubectl get cluster pg-cluster
kubectl describe cluster pg-cluster
kubectl get pods -l cnpg.io/cluster=pg-cluster
查看日志
kubectl logs pg-cluster-1 -f
kubectl logs -l cnpg.io/cluster=pg-cluster --all-containers
Prometheus指标
启用 enablePodMonitor: true 后,可获取以下指标:
cnpg_backends_total- 活动连接数cnpg_pg_replication_lag_seconds- 副本延迟cnpg_pg_database_size_bytes- 数据库大小
故障排除
集群未就绪
kubectl describe cluster pg-cluster
kubectl get pods -l cnpg.io/cluster=pg-cluster
kubectl logs pg-cluster-1
连接问题
# 测试连通性
kubectl run pg-client --rm -it --restart=Never \
--image=postgres:17 -- \
psql "postgresql://app_user:password@pg-cluster-rw:5432/learnflow" -c "SELECT 1;"
常见问题
| 错误 | 原因 | 修复方法 |
|---|---|---|
| PVC挂起 | 无存储类 | 在spec中添加 storageClass |
| 连接被拒绝 | 服务名称错误 | 写入操作使用 cluster-rw |
| 认证失败 | 凭证错误 | 检查密钥 cluster-app |
| 副本延迟高 | 写入负载重 | 扩容,增加资源 |
清理
# 删除集群(默认保留PVC)
kubectl delete cluster pg-cluster
# 删除PVC(数据丢失!)
kubectl delete pvc -l cnpg.io/cluster=pg-cluster
# 移除操作符
kubectl delete -f \
https://raw.githubusercontent.com/cloudnative-pg/cloudnative-pg/release-1.28/releases/cnpg-1.28.0.yaml
验证
运行:python scripts/verify.py
相关技能
operating-k8s-local- 本地Minikube集群设置scaffolding-fastapi-dapr- 使用SQLModel的FastAPI服务deploying-kafka-k8s- 用于事件驱动架构的Kafka