name: azure-infrastructure description: Azure云基础设施检查。用于调查Azure虚拟机、AKS集群、Log Analytics(KQL)、监控指标/警报、成本管理或NSG规则。 allowed-tools: Bash(python *)
Azure基础设施
认证
重要:凭据由代理层自动注入。不要检查环境变量中的AZURE_CLIENT_SECRET或AZURE_TENANT_ID - 它们对您不可见。直接运行脚本;认证透明处理。
您可以检查的配置环境变量(非机密):
AZURE_SUBSCRIPTION_ID- Azure订阅IDAZURE_RESOURCE_GROUP- 默认资源组
强制:查询优先调查
从Log Analytics或监控指标开始,然后深入资源。
LOG ANALYTICS / METRICS → 识别资源 → 描述资源 → 检查警报
可用脚本
所有脚本位于.claude/skills/infrastructure-azure/scripts/
query_log_analytics.py - KQL日志查询(日志调查从此开始)
python .claude/skills/infrastructure-azure/scripts/query_log_analytics.py --workspace-id WORKSPACE_ID --query "AzureDiagnostics | where Level == 'Error' | limit 50"
python .claude/skills/infrastructure-azure/scripts/query_log_analytics.py --workspace-id WORKSPACE_ID --query "Heartbeat | summarize count() by Computer" --timespan PT1H
query_resource_graph.py - 跨订阅资源查询
python .claude/skills/infrastructure-azure/scripts/query_resource_graph.py --query "Resources | where type == 'microsoft.compute/virtualmachines' | project name, location"
get_monitor_metrics.py - Azure监控指标
python .claude/skills/infrastructure-azure/scripts/get_monitor_metrics.py --resource-id RESOURCE_ID --metrics "Percentage CPU,Network In" --interval PT5M
list_monitor_alerts.py - 警报规则
python .claude/skills/infrastructure-azure/scripts/list_monitor_alerts.py [--resource-group RG]
list_vms.py / describe_vm.py - 虚拟机
python .claude/skills/infrastructure-azure/scripts/list_vms.py [--resource-group RG]
python .claude/skills/infrastructure-azure/scripts/describe_vm.py --resource-group RG --vm-name VM
list_aks_clusters.py / describe_aks_cluster.py - AKS集群
python .claude/skills/infrastructure-azure/scripts/list_aks_clusters.py [--resource-group RG]
python .claude/skills/infrastructure-azure/scripts/describe_aks_cluster.py --resource-group RG --cluster-name CLUSTER
query_costs.py - 成本管理
python .claude/skills/infrastructure-azure/scripts/query_costs.py --start 2026-01-01 --end 2026-02-01 [--granularity Monthly] [--group-by ResourceGroup,ServiceName]
get_nsg_rules.py - 网络安全组规则
python .claude/skills/infrastructure-azure/scripts/get_nsg_rules.py --resource-group RG --nsg-name NSG
KQL查询参考
// 最近一小时的错误
AzureDiagnostics | where Level == "Error" | where TimeGenerated > ago(1h) | limit 50
// CPU使用率
Perf | where CounterName == "% Processor Time" | summarize avg(CounterValue) by bin(TimeGenerated, 5m), Computer
// 心跳(可用性)
Heartbeat | summarize count() by Computer, bin(TimeGenerated, 1h)
// 资源图 - 查找虚拟机
Resources | where type == "microsoft.compute/virtualmachines" | project name, location, properties.hardwareProfile.vmSize
调查工作流
虚拟机性能问题
1. get_monitor_metrics.py --resource-id <vm-id> --metrics "Percentage CPU,Network In"
2. query_log_analytics.py --query "Perf | where Computer == '<vm>' | where CounterName == '% Processor Time'"
3. describe_vm.py --resource-group <rg> --vm-name <vm>
成本飙升
1. query_costs.py --start <start> --end <end> --group-by ResourceGroup,ServiceName
2. query_resource_graph.py --query "Resources | summarize count() by type, location"