混沌工程运行器 chaos-runner

混沌工程运行器是一个用于在分布式系统中主动注入故障、验证系统弹性的专业工具。它支持Chaos Monkey、Litmus、Gremlin等主流混沌工程框架,能够执行Pod终止、网络延迟、CPU压力、内存压力等多种故障场景。通过控制爆炸半径、进行稳态验证和自动回滚,帮助开发者和运维团队发现系统弱点,提升系统的高可用性和容错能力。关键词:混沌工程、故障注入、系统弹性、高可用性、容错测试、Litmus、Gremlin、Chaos Monkey、Kubernetes、DevOps。

DevOps 0 次安装 0 次浏览 更新于 2/26/2026

name: chaos-runner description: 使用Chaos Monkey、Litmus或Gremlin运行混沌工程实验 allowed-tools:

  • Bash
  • Read
  • Write
  • Glob

混沌工程运行器技能

概述

使用Chaos Monkey、Litmus或Gremlin运行混沌工程实验,包括故障注入场景、爆炸半径控制和实验分析。

能力

  • 运行Chaos Monkey实验
  • Litmus混沌执行
  • Gremlin集成
  • 故障注入场景
  • 爆炸半径控制
  • 稳态验证
  • 实验回滚
  • 结果分析

目标流程

  • 弹性模式

输入模式

{
  "type": "object",
  "required": ["experiment"],
  "properties": {
    "experiment": {
      "type": "object",
      "properties": {
        "name": { "type": "string" },
        "type": {
          "type": "string",
          "enum": ["pod-kill", "network-latency", "cpu-stress", "memory-stress", "disk-fill", "node-drain"]
        },
        "target": {
          "type": "object",
          "properties": {
            "namespace": { "type": "string" },
            "labelSelector": { "type": "object" },
            "percentage": { "type": "number" }
          }
        },
        "duration": { "type": "string" }
      }
    },
    "framework": {
      "type": "string",
      "enum": ["litmus", "gremlin", "chaos-monkey", "toxiproxy"],
      "default": "litmus"
    },
    "steadyState": {
      "type": "object",
      "properties": {
        "probes": { "type": "array" },
        "assertions": { "type": "array" }
      }
    },
    "options": {
      "type": "object",
      "properties": {
        "dryRun": {
          "type": "boolean",
          "default": true
        },
        "autoRollback": {
          "type": "boolean",
          "default": true
        },
        "notifyOnFailure": {
          "type": "boolean",
          "default": true
        }
      }
    }
  }
}

输出模式

{
  "type": "object",
  "properties": {
    "experimentId": {
      "type": "string"
    },
    "status": {
      "type": "string",
      "enum": ["passed", "failed", "aborted"]
    },
    "steadyStateValidation": {
      "type": "object",
      "properties": {
        "before": { "type": "boolean" },
        "during": { "type": "boolean" },
        "after": { "type": "boolean" }
      }
    },
    "metrics": {
      "type": "object",
      "properties": {
        "affectedPods": { "type": "number" },
        "recoveryTime": { "type": "string" },
        "errorRate": { "type": "number" }
      }
    },
    "findings": {
      "type": "array"
    },
    "recommendations": {
      "type": "array"
    }
  }
}

使用示例

{
  kind: 'skill',
  skill: {
    name: 'chaos-runner',
    context: {
      experiment: {
        name: 'pod-failure-test',
        type: 'pod-kill',
        target: {
          namespace: 'production',
          labelSelector: { app: 'api-service' },
          percentage: 50
        },
        duration: '5m'
      },
      framework: 'litmus',
      steadyState: {
        probes: [{ type: 'http', endpoint: '/health' }],
        assertions: [{ metric: 'error_rate', operator: '<', value: 0.01 }]
      },
      options: {
        dryRun: false,
        autoRollback: true
      }
    }
  }
}