沙箱化技能

名称: 沙箱化版本: 1.0.0 领域: 安全/隔离风险级别: 高语言: [python, c, rust, go] 框架: [seccomp, apparmor, selinux, bubblewrap] 需要安全审查: true 合规: [SOC2, FedRAMP] 最后更新: 2025-01-15

强制阅读协议: 在实施沙箱化之前，请阅读 references/advanced-patterns.md 以获取深度防御策略和 references/threat-model.md 以了解容器逃逸场景。

1. 概述

1.1 目的和范围

此技能为JARVIS组件提供进程隔离和沙箱化：

Linux: seccomp-bpf、AppArmor/SELinux、命名空间、cgroups
Windows: AppContainer、作业对象、受限令牌
macOS: sandbox-exec、应用沙箱权限
容器: Docker/Podman安全上下文、Kubernetes SecurityContext

1.2 风险评估

风险级别: 高

理由:

沙箱逃逸可能导致系统完全被攻陷
配置错误会抵消所有隔离优势
内核漏洞绕过用户空间控制
插件/扩展执行需要强隔离

攻击面:

系统调用过滤漏洞
命名空间逃逸向量
能力配置错误
资源耗尽攻击

2. 核心责任

2.1 主要功能

隔离不可信代码 执行，防止影响主机系统
限制系统调用 到最小必需集
限制资源（CPU、内存、网络、文件系统）
执行安全策略 通过MAC（AppArmor/SELinux）
控制故障 防止连锁效应

2.2 核心原则

测试驱动开发优先: 在实现前编写沙箱限制测试
性能感知: 缓存权限、延迟加载能力、最小化系统调用开销
深度防御: 结合多层隔离机制
最小权限: 仅授予所需的最小权限
安全失败: 默认拒绝所有访问

2.3 安全原则

永不在没有系统调用过滤的情况下运行不可信代码
永不授予CAP_SYS_ADMIN给沙箱化进程
始终丢弃所有非明确需要的能力
始终在可能时使用只读根文件系统
始终应用深度防御（多层）

3. 技术栈

平台	主要	次要	MAC
Linux	seccomp-bpf	命名空间	AppArmor/SELinux
Windows	AppContainer	作业对象	WDAC
macOS	sandbox-exec	权限	TCC
容器	securityContext	RuntimeClass	Pod Security

推荐工具: bubblewrap、firejail、nsjail、gVisor

4. 实现模式

4.1 Seccomp-BPF过滤器（python-seccomp）

import seccomp
import os

def create_minimal_sandbox():
    """为不可信代码创建最小seccomp沙箱。"""
    filter = seccomp.SyscallFilter(defaction=seccomp.KILL)

    # 必需的系统调用
    essential = [
        'read', 'write', 'close', 'fstat', 'lseek',
        'mmap', 'mprotect', 'munmap', 'brk',
        'rt_sigaction', 'rt_sigprocmask', 'rt_sigreturn',
        'exit', 'exit_group', 'futex', 'clock_gettime',
    ]

    for syscall in essential:
        filter.add_rule(seccomp.ALLOW, syscall)

    return filter

def run_sandboxed(func, *args, **kwargs):
    """在seccomp沙箱中执行函数。"""
    filter = create_minimal_sandbox()
    pid = os.fork()

    if pid == 0:
        filter.load()
        try:
            func(*args, **kwargs)
            os._exit(0)
        except Exception:
            os._exit(1)
    else:
        _, status = os.waitpid(pid, 0)
        return os.WEXITSTATUS(status) == 0

📚 关于自定义BPF过滤器和高级seccomp:

参见 references/advanced-patterns.md#seccomp-bpf

4.2 Bubblewrap沙箱（推荐）

import subprocess
from typing import List

class BubblewrapSandbox:
    """使用bubblewrap的高级沙箱化。"""

    def __init__(self):
        self._args = ['bwrap']

    def with_minimal_filesystem(self) -> 'BubblewrapSandbox':
        self._args.extend([
            '--ro-bind', '/usr', '/usr',
            '--ro-bind', '/lib', '/lib',
            '--ro-bind', '/lib64', '/lib64',
            '--symlink', 'usr/bin', '/bin',
            '--proc', '/proc', '--dev', '/dev',
            '--tmpfs', '/tmp',
        ])
        return self

    def with_network_isolation(self) -> 'BubblewrapSandbox':
        self._args.append('--unshare-net')
        return self

    def drop_capabilities(self) -> 'BubblewrapSandbox':
        self._args.append('--cap-drop ALL')
        return self

    def run(self, command: List[str], timeout: int = 30):
        return subprocess.run(
            self._args + ['--'] + command,
            capture_output=True, timeout=timeout
        )

# 用法
def run_untrusted_script(script_path: str) -> str:
    sandbox = BubblewrapSandbox()
    sandbox.with_minimal_filesystem().with_network_isolation().drop_capabilities()
    result = sandbox.run(['python3', script_path], timeout=10)
    return result.stdout.decode()

📚 关于命名空间隔离和高级bubblewrap:

参见 references/advanced-patterns.md#namespaces

4.3 Kubernetes SecurityContext

apiVersion: v1
kind: Pod
metadata:
  name: jarvis-worker
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 1000
    seccompProfile:
      type: RuntimeDefault

  containers:
  - name: worker
    image: jarvis-worker:latest
    securityContext:
      allowPrivilegeEscalation: false
      readOnlyRootFilesystem: true
      capabilities:
        drop: [ALL]

    resources:
      limits:
        cpu: "1"
        memory: "512Mi"

    volumeMounts:
    - name: tmp
      mountPath: /tmp

  volumes:
  - name: tmp
    emptyDir:
      medium: Memory
      sizeLimit: 64Mi

5. 实现工作流程（TDD）

步骤 1: 先编写失败测试

import pytest
from sandbox import SandboxManager

class TestSandboxRestrictions:
    """在实现前测试沙箱隔离。"""

    @pytest.fixture
    def sandbox(self):
        return SandboxManager()

    def test_network_blocked(self, sandbox):
        """先编写: 网络访问必须被阻止。"""
        result = sandbox.run(['curl', '-s', 'http://example.com'])
        assert result.returncode != 0, "网络应被阻止"

    def test_filesystem_readonly(self, sandbox):
        """先编写: 根文件系统必须为只读。"""
        result = sandbox.run(['touch', '/test-file'])
        assert result.returncode != 0, "根文件系统应为只读"

    def test_capabilities_dropped(self, sandbox):
        """先编写: 所有能力必须被丢弃。"""
        result = sandbox.run(['cat', '/proc/self/status'])
        assert 'CapEff:\t0000000000000000' in result.stdout

    def test_syscall_blocked(self, sandbox):
        """先编写: 危险系统调用必须被阻止。"""
        # ptrace应被seccomp阻止
        result = sandbox.run(['strace', 'ls'])
        assert result.returncode != 0, "ptrace应被阻止"

    def test_escape_attempt_fails(self, sandbox):
        """先编写: 容器逃逸尝试必须失败。"""
        result = sandbox.run(['ls', '/proc/1/root'])
        assert result.returncode != 0, "命名空间逃逸被阻止"

步骤 2: 实现最小化以通过测试

class SandboxManager:
    def __init__(self):
        self._bwrap_args = ['bwrap', '--unshare-net', '--ro-bind', '/', '/',
                           '--cap-drop', 'ALL', '--seccomp', '3']

    def run(self, command, timeout=30):
        import subprocess
        return subprocess.run(self._bwrap_args + ['--'] + command,
                              capture_output=True, text=True, timeout=timeout)

步骤 3: 使用深度防御重构

class SandboxManager:
    def __init__(self, profile: str = 'strict'):
        self._bwrap_args = ['bwrap', '--unshare-all']
        if profile == 'network': self._bwrap_args.append('--share-net')
        self._bwrap_args.extend(['--ro-bind', '/usr', '/usr', '--tmpfs', '/tmp',
                                 '--cap-drop', 'ALL', '--seccomp', '3'])

步骤 4: 运行完整验证

# 运行所有沙箱测试
pytest tests/sandbox/ -v --tb=short

# 测试特定隔离功能
pytest tests/sandbox/test_network.py -v
pytest tests/sandbox/test_capabilities.py -v
pytest tests/sandbox/test_escapes.py -v

# 安全审计
python -m security_audit --sandbox

6. 性能模式

6.1 权限缓存

# 错误: 每次操作都从磁盘加载权限
def run_sandboxed(command):
    permissions = load_permissions_from_disk()  # 每次慢速I/O
    return execute(command)

# 好: 使用TTL缓存
class PermissionCache:
    def __init__(self, ttl=300):
        self._cache, self._ttl = {}, ttl

    def get(self, profile):
        if profile in self._cache and time() - self._cache[profile][1] < self._ttl:
            return self._cache[profile][0]
        perms = load_from_disk(profile)
        self._cache[profile] = (perms, time())
        return perms

6.2 延迟能力加载

# 错误: 启动时加载所有安全模块
class Sandbox:
    def __init__(self):
        self.seccomp = load_seccomp_filters()      # 昂贵
        self.apparmor = load_apparmor_profiles()   # 昂贵

# 好: 仅当需要时延迟加载
class Sandbox:
    _seccomp = None
    @property
    def seccomp(self):
        if self._seccomp is None: self._seccomp = load_seccomp_filters()
        return self._seccomp

6.3 高效IPC

# 错误: 每次调用序列化完整状态
def send_to_sandbox(data):
    return sandbox.communicate(serialize_full_state() + data)

# 好: 对大数荅使用共享内存
class EfficientIPC:
    def __init__(self, size=1024*1024):
        self._shm = mmap.mmap(-1, size)
    def send(self, data): self._shm.seek(0); self._shm.write(data)
    def recv(self, size): self._shm.seek(0); return self._shm.read(size)

6.4 资源池化

# 错误: 为每个任务创建新沙箱
for task in tasks:
    sandbox = create_sandbox()  # 昂贵
    sandbox.run(task); sandbox.destroy()

# 好: 池化和重用
class SandboxPool:
    def __init__(self, size=4):
        self._pool = Queue(size)
        for _ in range(size): self._pool.put(create_sandbox())
    def acquire(self): return self._pool.get()
    def release(self, sb): sb.reset(); self._pool.put(sb)

6.5 最小权限集

# 错误: 预先请求所有能力
CAPS = ['CAP_NET_ADMIN', 'CAP_SYS_ADMIN', 'CAP_DAC_OVERRIDE', ...]

# 好: 每个操作的最小集
CAPABILITY_SETS = {
    'network_bind': ['CAP_NET_BIND_SERVICE'],
    'file_read': [],
    'file_write': ['CAP_DAC_OVERRIDE'],
}
def get_caps(op): return CAPABILITY_SETS.get(op, [])

7. 安全标准

7.1 已知漏洞

CVE	严重性	组件	缓解措施
CVE-2024-21626	严重	runC	容器逃逸 - 使用runC 1.1.12+
CVE-2022-0185	高	Linux内核	堆溢出 - 更新内核
CVE-2022-0492	高	cgroups	逃逸 - 丢弃CAP_SYS_ADMIN
CVE-2022-0847	高	Linux内核	Dirty Pipe - 使用内核5.16.11+
CVE-2023-2431	低	Kubernetes	Seccomp绕过 - 应用K8s补丁

7.2 OWASP映射

OWASP 2025	风险	实现
A01: 访问控制破坏	严重	命名空间隔离、MAC
A04: 不安全设计	高	深度防御
A05: 安全配置错误	严重	安全默认设置

7.3 深度防御层

Seccomp: 系统调用过滤
命名空间: 资源隔离
能力: 权限降低
MAC: 强制访问控制（AppArmor/SELinux）
Cgroups: 资源限制

📚 详细OWASP覆盖:

参见 references/security-examples.md

8. 测试要求

class TestSandboxSecurity:
    def test_network_isolated(self, sandbox):
        assert sandbox.run(['curl', '-s', 'https://example.com']).returncode != 0
    def test_capabilities_dropped(self, sandbox):
        assert 'CapEff:\t0' in sandbox.run(['cat', '/proc/self/status']).stdout
    def test_escape_attempts_blocked(self, sandbox):
        assert sandbox.run(['ls', '/proc/1/root']).returncode != 0

📚 完整测试套件: 参见 references/security-examples.md#testing

9. 常见错误

关键反模式

# ❌ 永不: runAsUser: 0 (root)          ✅ 始终: runAsNonRoot: true, runAsUser: 1000
# ❌ 永不: add: [SYS_ADMIN]              ✅ 始终: drop: [ALL], 仅添加需要的能力
# ❌ 永不: privileged: true              ✅ 始终: privileged: false
# ❌ 永不: 无seccomp配置                 ✅ 始终: seccompProfile: RuntimeDefault

# 安全配置示例
securityContext:
  runAsNonRoot: true
  runAsUser: 1000
  privileged: false
  allowPrivilegeEscalation: false
  capabilities: {drop: [ALL]}
  seccompProfile: {type: RuntimeDefault}

📚 完整反模式: 参见 references/advanced-patterns.md#anti-patterns

10. 预实现检查清单

阶段 1: 编写代码前

[ ] 从PRD识别隔离需求
[ ] 审查威胁模型以了解攻击向量
[ ] 定义最小能力集
[ ] 选择适当的隔离层
[ ] 编写所有限制的失败测试

阶段 2: 实现期间

[ ] 实现深度防御层
[ ] 丢弃所有能力，仅添加所需
[ ] 应用seccomp过滤器以阻止系统调用
[ ] 配置命名空间隔离
[ ] 设置资源限制（cgroups）
[ ] 使用只读根文件系统
[ ] 每添加一层后运行测试

阶段 3: 提交前

[ ] 所有沙箱限制测试通过
[ ] 逃逸尝试测试已验证
[ ] 无容器以root身份运行
[ ] allowPrivilegeEscalation: false
[ ] seccompProfile: RuntimeDefault或更严格
[ ] 资源限制已定义
[ ] 安全审计已完成
[ ] 性能基准可接受

11. 总结

关键目标

多层防御: 结合seccomp、命名空间、能力、MAC
最小权限: 丢弃所有能力，以非root身份运行
系统调用过滤: 默认阻止危险系统调用
容器硬化: 只读文件系统，无权限提升

安全提醒

单一配置错误可抵消所有沙箱化
深度防御至关重要 - 无单一层足够
将逃逸尝试测试作为安全验证的一部分

参考

references/advanced-patterns.md - 自定义seccomp、gVisor、命名空间
references/security-examples.md - 平台特定实现
references/threat-model.md - 容器逃逸场景

沙箱化是你的最后一道防线。当其他一切失败时，沙箱必须坚守。