Grafana仪表板 grafana-dashboard

这个技能是关于如何设计和实现包含多种可视化类型、变量和钻取功能的全面 Grafana 仪表板,用于操作监控,关键词包括监控仪表板、数据可视化、时间序列数据、运营洞察。

数据可视化 0 次安装 0 次浏览 更新于 3/4/2026

name: grafana-dashboard description: 创建专业的 Grafana 仪表板,包含可视化、模板和警报。用于构建监控仪表板、创建数据可视化或设置运营洞察。

Grafana 仪表板

概览

设计并实现包含多种可视化类型、变量和钻取功能的全面 Grafana 仪表板,用于操作监控。

何时使用

  • 创建监控仪表板
  • 构建运营洞察
  • 可视化时间序列数据
  • 创建钻取仪表板
  • 与利益相关者共享指标

指南

1. Grafana 仪表板 JSON

{
  "dashboard": {
    "title": "应用性能",
    "description": "实时应用指标",
    "tags": ["生产", "性能"],
    "timezone": "UTC",
    "refresh": "30s",
    "templating": {
      "list": [
        {
          "name": "datasource",
          "type": "datasource",
          "datasource": "prometheus"
        },
        {
          "name": "service",
          "type": "query",
          "datasource": "prometheus",
          "query": "label_values(requests_total, service)"
        }
      ]
    },
    "panels": [
      {
        "id": 1,
        "title": "请求速率",
        "type": "graph",
        "gridPos": {"x": 0, "y": 0, "w": 12, "h": 8},
        "targets": [
          {
            "expr": "sum(rate(requests_total{service=\"$service\"}[5m]))",
            "legendFormat": "{{ method }}"
          }
        ],
        "yaxes": [
          {
            "format": "rps",
            "label": "每秒请求数"
          }
        ]
      },
      {
        "id": 2,
        "title": "错误率",
        "type": "graph",
        "gridPos": {"x": 12, "y": 0, "w": 12, "h": 8},
        "targets": [
          {
            "expr": "sum(rate(requests_total{status_code=~\"5..\",service=\"$service\"}[5m])) / sum(rate(requests_total{service=\"$service\"}[5m]))",
            "legendFormat": "错误率"
          }
        ]
      },
      {
        "id": 3,
        "title": "响应延迟 (p95)",
        "type": "graph",
        "gridPos": {"x": 0, "y": 8, "w": 12, "h": 8},
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(request_duration_seconds_bucket{service=\"$service\"}[5m]))",
            "legendFormat": "p95"
          }
        ]
      },
      {
        "id": 4,
        "title": "活跃连接数",
        "type": "stat",
        "gridPos": {"x": 12, "y": 8, "w": 12, "h": 8},
        "targets": [
          {
            "expr": "sum(active_connections{service=\"$service\"})"
          }
        ]
      }
    ]
  }
}

2. Grafana 配置文件

# /etc/grafana/provisioning/dashboards/dashboards.yaml
apiVersion: 1

providers:
  - name: '仪表板'
    orgId: 1
    folder: '生产'
    type: file
    disableDeletion: false
    updateIntervalSeconds: 10
    options:
      path: /var/lib/grafana/dashboards
# /etc/grafana/provisioning/datasources/prometheus.yaml
apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    orgId: 1
    url: http://prometheus:9090
    isDefault: true
    editable: true
    jsonData:
      timeInterval: '30s'

3. Grafana 警报配置

# /etc/grafana/provisioning/alerting/alerts.yaml
groups:
  - name: application_alerts
    interval: 1m
    rules:
      - uid: alert_high_error_rate
        title: 高错误率
        condition: B
        data:
          - refId: A
            model:
              expr: 'sum(rate(requests_total{status_code=~"5.."}[5m]))'
          - refId: B
            conditions:
              - evaluator:
                  params: [0.05]
                  type: gt
                query:
                  params: [A, 5m, now]
        for: 5m
        annotations:
          description: '错误率是 {{ $values.A }}'
        labels:
          severity: critical
          team: platform

4. Grafana API 客户端

// grafana-api-client.js
const axios = require('axios');

class GrafanaClient {
  constructor(baseUrl, apiKey) {
    this.baseUrl = baseUrl;
    this.client = axios.create({
      baseURL: baseUrl,
      headers: {
        'Authorization': `Bearer ${apiKey}`,
        'Content-Type': 'application/json'
      }
    });
  }

  async createDashboard(dashboard) {
    const response = await this.client.post('/api/dashboards/db', {
      dashboard: dashboard,
      overwrite: true
    });
    return response.data;
  }

  async getDashboard(uid) {
    const response = await this.client.get(`/api/dashboards/uid/${uid}`);
    return response.data;
  }

  async createAlert(alert) {
    const response = await this.client.post('/api/alerts', alert);
    return response.data;
  }

  async listDashboards() {
    const response = await this.client.get('/api/search?query=');
    return response.data;
  }
}

module.exports = GrafanaClient;

5. Docker 组合设置

version: '3.8'
services:
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD:-admin}
      GF_USERS_ALLOW_SIGN_UP: 'false'
      GF_SERVER_ROOT_URL: http://grafana.example.com
    volumes:
      - ./provisioning:/etc/grafana/provisioning
      - grafana_storage:/var/lib/grafana
    depends_on:
      - prometheus

  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_storage:/prometheus

volumes:
  grafana_storage:
  prometheus_storage:

最佳实践

✅ 执行

  • 使用有意义的仪表板标题
  • 添加文档面板
  • 实施基于行的组织
  • 使用变量以增加灵活性
  • 设置适当的刷新间隔
  • 在警报中包含运行手册链接
  • 在部署前测试警报
  • 使用一致的颜色方案
  • 版本控制仪表板 JSON

❌ 不要

  • 用太多面板超载仪表板
  • 无理由地混合不同的时间范围
  • 没有运行手册就创建
  • 忽视警报噪音
  • 使用不一致的指标命名
  • 设置过于频繁的刷新
  • 忘记配置数据源
  • 保留默认密码

可视化类型

  • Graph: 时间序列趋势
  • Stat: 带阈值的单个值
  • Gauge: 百分比或使用情况
  • Heatmap: 模式检测
  • Bar Chart: 类别比较
  • Pie Chart: 组成