名称：调试技术描述：提供Python（pdb、debugpy）、Go（delve）、Rust（lldb）和Node.js的调试工作流，包括容器调试（kubectl debug、ephemeral containers）以及生产安全的调试技术，如分布式追踪和correlation IDs。适用于设置断点、调试容器/pods、远程调试或生产调试时。

调试技术

目的

提供系统化的调试工作流，适用于本地、远程、容器和生产环境，覆盖Python、Go、Rust和Node.js。包括交互式调试器、容器调试（使用ephemeral containers）以及生产安全技术（使用correlation IDs和分布式追踪）。

何时使用此技能

触发此技能用于：

在Python、Go、Rust或Node.js代码中设置断点
调试运行中的容器或Kubernetes pods
设置远程调试连接
安全地调试生产问题
检查goroutines、线程或异步任务
分析核心转储或堆栈跟踪
为场景选择正确的调试工具

快速参考按语言

Python 调试

内置：pdb

# Python 3.7+
def buggy_function(x, y):
    breakpoint()  # 在此处停止执行
    return x / y

# 旧版 Python
import pdb
pdb.set_trace()

基本pdb命令：

list (l) - 显示当前行周围的代码
next (n) - 执行当前行，跳过函数
step (s) - 执行当前行，进入函数
continue © - 继续执行直到下一个断点
print var (p) - 打印变量值
where (w) - 显示堆栈跟踪
quit (q) - 退出调试器

增强工具：

ipdb - 增强版pdb，具有标签补全、语法高亮（pip install ipdb）
pudb - 终端GUI调试器（pip install pudb）
debugpy - VS Code集成（包含在Python扩展中）

调试测试：

pytest --pdb  # 测试失败时进入调试器

详细Python调试模式，请参见references/python-debugging.md。

Go 调试

Delve - 官方Go调试器

安装：

go install github.com/go-delve/delve/cmd/dlv@latest

基本用法：

dlv debug main.go              # 调试主包
dlv test github.com/me/pkg     # 调试测试套件
dlv attach <pid>               # 附加到运行中的进程
dlv debug -- --config prod.yaml  # 传递参数

基本命令：

break main.main (b) - 在函数处设置断点
break file.go:10 (b) - 在行处设置断点
continue © - 继续执行
next (n) - 步过
step (s) - 步入
print x (p) - 打印变量
goroutine (gr) - 显示当前goroutine
goroutines (grs) - 列出所有goroutines
goroutines -t - 显示goroutine堆栈跟踪
stack (bt) - 显示堆栈跟踪

Goroutine调试：

(dlv) goroutines                 # 列出所有goroutines
(dlv) goroutines -t              # 显示堆栈跟踪
(dlv) goroutines -with user      # 过滤用户goroutines
(dlv) goroutine 5                # 切换到goroutine 5

详细Go调试模式，请参见references/go-debugging.md。

Rust 调试

LLDB - 默认Rust调试器

编译：

cargo build  # 调试构建默认包含符号

用法：

rust-lldb target/debug/myapp   # Rust的LLDB包装器
rust-gdb target/debug/myapp    # GDB包装器（替代）

基本LLDB命令：

breakpoint set -f main.rs -l 10 - 在行处设置断点
breakpoint set -n main - 在函数处设置断点
run ® - 启动程序
continue © - 继续执行
next (n) - 步过
step (s) - 步入
print variable (p) - 打印变量
frame variable (fr v) - 显示局部变量
backtrace (bt) - 显示堆栈跟踪
thread list - 列出所有线程

VS Code集成：

安装CodeLLDB扩展（vadimcn.vscode-lldb）
为Rust项目配置launch.json

详细Rust调试模式，请参见references/rust-debugging.md。

Node.js 调试

内置：node --inspect

基本用法：

node --inspect-brk app.js       # 启动并立即暂停
node --inspect app.js           # 启动并运行
node --inspect=0.0.0.0:9229 app.js  # 指定主机/端口

Chrome DevTools：

打开chrome://inspect
点击“Open dedicated DevTools for Node”
设置断点，检查变量

VS Code集成： 配置launch.json：

{
  "type": "node",
  "request": "launch",
  "name": "Launch Program",
  "program": "${workspaceFolder}/app.js"
}

Docker调试：

EXPOSE 9229
CMD ["node", "--inspect=0.0.0.0:9229", "app.js"]

详细Node.js调试模式，请参见references/nodejs-debugging.md。

容器和Kubernetes调试

kubectl debug with Ephemeral Containers

何时使用：

容器已崩溃（kubectl exec不工作）
使用distroless/最小化镜像（无shell，无工具）
需要调试工具而无需重建镜像
调试网络问题

基本用法：

# 添加ephemeral调试容器
kubectl debug -it <pod-name> --image=nicolaka/netshoot

# 共享进程命名空间（查看其他容器进程）
kubectl debug -it <pod-name> --image=busybox --share-processes

# 目标特定容器
kubectl debug -it <pod-name> --image=busybox --target=app

推荐调试镜像：

nicolaka/netshoot (~380MB) - 网络调试（curl, dig, tcpdump, netstat）
busybox (~1MB) - 最小化shell和实用程序
alpine (~5MB) - 轻量级，带包管理器
ubuntu (~70MB) - 完整环境

节点调试：

kubectl debug node/<node-name> -it --image=ubuntu

Docker容器调试：

docker exec -it <container-id> sh

# 如果无shell可用
docker run -it --pid=container:<container-id> \
           --net=container:<container-id> \
           busybox sh

详细容器调试模式，请参见references/container-debugging.md。

生产调试

生产调试原则

黄金规则：

最小性能影响 - 分析开销，限制范围
无阻塞操作 - 使用非中断技术
安全意识 - 避免记录秘密、PII
可逆 - 可以快速回滚（功能标志、Git）
可观察 - 结构化日志记录、correlation IDs、追踪

安全生产技术

1. 结构化日志记录

import logging
import json

logger = logging.getLogger(__name__)
logger.info(json.dumps({
    "event": "user_login_failed",
    "user_id": user_id,
    "error": str(e),
    "correlation_id": request_id
}))

2. Correlation IDs（请求追踪）

func handleRequest(w http.ResponseWriter, r *http.Request) {
    correlationID := r.Header.Get("X-Correlation-ID")
    if correlationID == "" {
        correlationID = generateUUID()
    }
    ctx := context.WithValue(r.Context(), "correlationID", correlationID)
    log.Printf("[%s] Processing request", correlationID)
}

3. 分布式追踪（OpenTelemetry）

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

def process_order(order_id):
    with tracer.start_as_current_span("process_order") as span:
        span.set_attribute("order.id", order_id)
        span.add_event("Order validated")

4. 错误追踪平台

Sentry - 带上下文的异常追踪
New Relic - APM带错误追踪
Datadog - 日志、指标、追踪
Rollbar - 错误监控

生产调试工作流：

检测 - 错误追踪警报、日志峰值、指标异常
定位 - 查找correlation ID、搜索日志、查看分布式追踪
重现 - 尝试在staging中使用生产数据（脱敏）重现
修复 - 创建功能标志，首先部署到canary
验证 - 检查错误率、审查日志、监控追踪

详细生产调试模式，请参见references/production-debugging.md。

决策框架

哪种调试器用于哪种语言？

语言	主要工具	安装	最佳用途
Python	pdb	内置	简单脚本、服务器环境
	ipdb	`pip install ipdb`	增强UX、IPython用户
	debugpy	VS Code扩展	IDE集成、远程调试
Go	delve	`go install github.com/go-delve/delve/cmd/dlv@latest`	所有Go调试、goroutines
Rust	rust-lldb	系统包	Mac、Linux、MSVC Windows
	rust-gdb	系统包	Linux、偏好GDB
Node.js	node --inspect	内置	所有Node.js调试、Chrome DevTools

哪种技术用于哪种场景？

场景	推荐技术	工具
本地开发	交互式调试器	pdb、delve、lldb、node --inspect
测试中的错误	测试特定调试	pytest --pdb、dlv test、cargo test
远程服务器	SSH隧道 + 远程附加	VS Code Remote、debugpy
容器（本地）	docker exec -it	sh/bash + 调试器
Kubernetes pod	Ephemeral容器	kubectl debug --image=nicolaka/netshoot
Distroless镜像	Ephemeral容器（必需）	kubectl debug with busybox/alpine
生产问题	日志分析 + 错误追踪	结构化日志、Sentry、correlation IDs
Goroutine死锁	Goroutine检查	delve goroutines -t
崩溃进程	核心转储分析	gdb core、lldb -c core
分布式失败	分布式追踪	OpenTelemetry、Jaeger、correlation IDs
竞争条件	竞争检测器 + 调试器	go run -race、cargo test

生产调试安全检查表

在生产调试前：

[ ] 这会影响性能吗？（分析开销）
[ ] 这会阻塞用户吗？（使用非中断技术）
[ ] 这可能暴露秘密吗？（避免变量转储）
[ ] 有回滚计划吗？（Git分支、功能标志）
[ ] 我们首先尝试日志了吗？（较少侵入）
[ ] 我们有correlation IDs吗？（追踪请求）
[ ] 错误追踪启用了吗？（Sentry、New Relic）
[ ] 我们能在staging中重现吗？（更安全环境）

常见调试工作流

工作流1：本地开发错误

在代码中插入断点（语言特定）
启动调试器（dlv debug、rust-lldb、node --inspect-brk）
执行到断点（run、continue）
检查变量（print、frame variable）
逐步执行代码（next、step、finish）
识别问题并修复

工作流2：测试失败调试

Python：

pytest --pdb  # 失败时进入pdb

Go：

dlv test github.com/user/project/pkg
(dlv) break TestMyFunction
(dlv) continue

Rust：

cargo test --no-run
rust-lldb target/debug/deps/myapp-<hash>
(lldb) breakpoint set -n test_name
(lldb) run test_name

工作流3：Kubernetes Pod调试

场景：使用distroless镜像的Pod，网络问题

# 步骤1：检查pod状态
kubectl get pod my-app-pod -o wide

# 步骤2：首先检查日志
kubectl logs my-app-pod

# 步骤3：如果日志不足，添加ephemeral容器
kubectl debug -it my-app-pod --image=nicolaka/netshoot

# 步骤4：在调试容器内，调查
curl localhost:8080
netstat -tuln
nslookup api.example.com

工作流4：生产错误调查

场景：API返回500错误

# 步骤1：检查错误追踪（Sentry）
# - 查找错误详情、堆栈跟踪
# - 从错误报告中复制correlation ID

# 步骤2：搜索日志中的correlation ID
# 在日志聚合工具中（ELK、Splunk）：
# correlation_id:"abc-123-def"

# 步骤3：查看分布式追踪
# 在追踪工具中（Jaeger、Datadog）：
# 按correlation ID搜索，审查span时间线

# 步骤4：在staging中重现
# 使用生产数据（脱敏）如果需要
# 添加额外日志如果需要

# 步骤5：修复和部署
# 创建功能标志以逐步推出
# 首先部署到canary环境
# 密切监控错误率

附加资源

对于语言特定深入：

references/python-debugging.md - pdb、ipdb、pudb、debugpy详细指南
references/go-debugging.md - Delve CLI、goroutine调试、条件断点
references/rust-debugging.md - LLDB vs GDB、所有权调试、宏调试
references/nodejs-debugging.md - node --inspect、Chrome DevTools、Docker调试

对于环境特定模式：

references/container-debugging.md - kubectl debug、ephemeral containers、节点调试
references/production-debugging.md - 结构化日志记录、correlation IDs、OpenTelemetry、错误追踪

对于决策支持：

references/decision-trees.md - 扩展调试决策框架

对于动手示例：

examples/ - 每个语言的逐步调试会话

调试技术Skill debugging-techniques

调试技术

目的

何时使用此技能

快速参考按语言

Python 调试

Go 调试

Rust 调试

Node.js 调试

容器和Kubernetes调试

kubectl debug with Ephemeral Containers

生产调试

生产调试原则

安全生产技术

决策框架

哪种调试器用于哪种语言？

哪种技术用于哪种场景？

生产调试安全检查表

常见调试工作流

工作流1：本地开发错误

工作流2：测试失败调试

工作流3：Kubernetes Pod调试

工作流4：生产错误调查

附加资源

相关技能