模型等效性审核器Skill model-equivariance-auditor

模型等效性审核器是一个用于验证机器学习模型是否正确实现对称性(等效性)的技能。它提供系统化的测试方法、层间分析和调试指导,帮助开发者检测和修复模型中的对称性错误,确保模型训练和预测的一致性。关键词:模型等效性、审核、验证、调试、对称性、机器学习、深度学习、AI测试。

机器学习 0 次安装 0 次浏览 更新于 3/22/2026

name: 模型等效性审核器 description: 当你实现了一个等效性模型并需要验证它是否正确尊重了预期的对称性时使用。当用户提到测试模型等效性、调试对称性错误、验证实现正确性、检查模型是否实际上是等效的,或诊断为什么等效性模型不工作时调用。提供验证测试和调试指导。

模型等效性审核器

它是什么?

这个技能帮助你验证你实现的模型是否正确尊重了其预期的对称性。即使使用了等效性库,实现错误也可能破坏等效性。这个技能提供系统化的验证测试和调试策略。

为什么要审核? 一个声称是等效性但实际不是的模型将训练不良并给出不一致的预测。早期发现这些错误可以节省调试时间。

工作流程

复制这个清单并跟踪你的进度:

Equivariance Audit Progress:
- [ ] Step 1: Gather model and symmetry specification
- [ ] Step 2: Run numerical equivariance tests
- [ ] Step 3: Test individual layers
- [ ] Step 4: Check gradient equivariance
- [ ] Step 5: Identify and diagnose failures
- [ ] Step 6: Document audit results

Step 1: Gather model and symmetry specification

收集:实现的模型、预期的对称性群、每个输出应该是 invariant 还是 equivariant、输入和输出空间的变换函数。从设计阶段回顾架构规范。在测试前与用户澄清歧义。

Step 2: Run numerical equivariance tests

使用Test Implementation执行端到端等效性测试。对于 invariance:验证 ||f(T(x)) - f(x)|| < ε。对于 equivariance:验证 ||f(T(x)) - T’(f(x))|| < ε。使用多个随机输入和变换。记录错误统计。有关阈值,请参见Error Interpretation。有关即用测试代码,请参见Test Code Templates

Step 3: Test individual layers

如果端到端测试失败,通过单独测试层来隔离问题。对于每个层:冻结其他层,测试该层单独的等效性。这标识哪个层破坏了等效性。使用Layer-wise Testing协议。特别仔细检查非线性、归一化和自定义操作。

Step 4: Check gradient equivariance

验证梯度也尊重等效性(对训练很重要)。计算在 x 和 T(x) 处的梯度。检查梯度是否适当变换。梯度错误可能导致训练"忘记"等效性。参见Gradient Testing

Step 5: Identify and diagnose failures

如果测试失败,使用Common Failure Modes进行诊断。检查:非等效非线性、批归一化问题、不正确的输出变换、数值精度问题、自定义层中的实现错误。提供具体的修复建议。有关逐步故障排除,请咨询Debugging Guide

Step 6: Document audit results

使用Output Template创建审核报告。包括:每个测试的通过/失败、错误幅度、识别的问题和建议。区分:精确等效性(数值精度)、近似等效性(可接受错误)和破坏的等效性(需要修复)。有关详细审核方法,请参见Methodology Details。此输出的质量标准定义在Quality Rubric

Test Implementation

End-to-End Equivariance Test

import torch

def test_model_equivariance(model, x, input_transform, output_transform,
                            n_tests=100, tol=1e-5):
    """
    Test if model is equivariant: f(T(x)) ≈ T'(f(x))

    Args:
        model: The neural network to test
        x: Sample input tensor
        input_transform: Function that transforms input
        output_transform: Function that transforms output
        n_tests: Number of random transformations to test
        tol: Error tolerance

    Returns:
        dict with test results
    """
    model.eval()
    errors = []

    with torch.no_grad():
        for _ in range(n_tests):
            # Generate random transformation
            T = sample_random_transform()

            # Method 1: Transform input, then apply model
            x_transformed = input_transform(x, T)
            y1 = model(x_transformed)

            # Method 2: Apply model, then transform output
            y = model(x)
            y2 = output_transform(y, T)

            # Compute error
            error = torch.norm(y1 - y2).item()
            relative_error = error / (torch.norm(y2).item() + 1e-8)
            errors.append({
                'absolute': error,
                'relative': relative_error
            })

    return {
        'mean_absolute': np.mean([e['absolute'] for e in errors]),
        'max_absolute': np.max([e['absolute'] for e in errors]),
        'mean_relative': np.mean([e['relative'] for e in errors]),
        'max_relative': np.max([e['relative'] for e in errors]),
        'pass': all(e['relative'] < tol for e in errors)
    }

Invariance Test (Simpler Case)

def test_model_invariance(model, x, transform, n_tests=100, tol=1e-5):
    """Test if model output is invariant to transformations."""
    model.eval()
    errors = []

    with torch.no_grad():
        y_original = model(x)

        for _ in range(n_tests):
            T = sample_random_transform()
            x_transformed = transform(x, T)
            y_transformed = model(x_transformed)

            error = torch.norm(y_transformed - y_original).item()
            errors.append(error)

    return {
        'mean_error': np.mean(errors),
        'max_error': np.max(errors),
        'pass': max(errors) < tol
    }

Layer-wise Testing

Protocol

def test_layer_equivariance(layer, x, input_transform, output_transform):
    """Test a single layer for equivariance."""
    layer.eval()

    with torch.no_grad():
        T = sample_random_transform()

        # Transform then layer
        y1 = layer(input_transform(x, T))

        # Layer then transform
        y2 = output_transform(layer(x), T)

        error = torch.norm(y1 - y2).item()

    return {
        'layer': layer.__class__.__name__,
        'error': error,
        'pass': error < tolerance
    }

def audit_all_layers(model, x, transforms):
    """Test each layer individually."""
    results = []

    for name, layer in model.named_modules():
        if is_testable_layer(layer):
            result = test_layer_equivariance(layer, x, *transforms)
            result['name'] = name
            results.append(result)

    return results

What to Test Per Layer

Layer Type What to Check
Convolution Kernel equivariance
Nonlinearity Should preserve equivariance
Normalization Often breaks equivariance
Pooling Correct aggregation
Linear Weight sharing patterns
Attention Permutation equivariance

Gradient Testing

Why Test Gradients?

Forward pass can be equivariant while backward pass is not. This causes:

  • Training instability
  • Model “unlearning” equivariance
  • Inconsistent optimization

Gradient Equivariance Test

def test_gradient_equivariance(model, x, loss_fn, transform, tol=1e-4):
    """Test if gradients respect equivariance."""
    model.train()

    # Gradients at original input
    x1 = x.clone().requires_grad_(True)
    y1 = model(x1)
    loss1 = loss_fn(y1)
    loss1.backward()
    grad1 = x1.grad.clone()

    # Gradients at transformed input
    model.zero_grad()
    T = sample_random_transform()
    x2 = transform(x.clone(), T).requires_grad_(True)
    y2 = model(x2)
    loss2 = loss_fn(y2)
    loss2.backward()
    grad2 = x2.grad.clone()

    # Transform grad1 and compare to grad2
    grad1_transformed = transform_gradient(grad1, T)
    error = torch.norm(grad2 - grad1_transformed).item()

    return {'error': error, 'pass': error < tol}

Error Interpretation

Error Thresholds

Error Level Interpretation Action
< 1e-6 Perfect (float32 precision) Pass
1e-6 to 1e-4 Excellent (acceptable) Pass
1e-4 to 1e-2 Approximate equivariance Investigate
> 1e-2 Broken equivariance Fix required

Relative vs Absolute Error

  • Absolute error: Raw difference magnitude
  • Relative error: Normalized by output magnitude

Use relative error when output magnitudes vary. Use absolute when comparing to numerical precision.

Common Failure Modes

1. Non-Equivariant Nonlinearity

Symptom: Error increases after nonlinearity layers Cause: Using ReLU, sigmoid on equivariant features Fix: Use gated nonlinearities, norm-based, or restrict to invariant features

2. Batch Normalization Breaking Equivariance

Symptom: Error varies with batch composition Cause: BN computes different stats for different orientations Fix: Use LayerNorm, GroupNorm, or equivariant batch norm

3. Incorrect Output Transformation

Symptom: Test fails even for identity transform Cause: output_transform doesn’t match model output type Fix: Verify output transformation matches layer output representation

4. Numerical Precision Issues

Symptom: Small but non-zero error everywhere Cause: Floating point accumulation, interpolation Fix: Use float64 for testing, accept small tolerance

5. Custom Layer Bug

Symptom: Error isolated to specific layer Cause: Implementation error in custom equivariant layer Fix: Review layer implementation against equivariance constraints

6. Padding/Boundary Effects

Symptom: Error higher near edges Cause: Padding doesn’t respect symmetry Fix: Use circular padding or handle boundaries explicitly

Output Template

MODEL EQUIVARIANCE AUDIT REPORT
===============================

Model: [Model name/description]
Intended Symmetry: [Group]
Symmetry Type: [Invariant/Equivariant]

END-TO-END TESTS:
-----------------
Test samples: [N]
Transformations tested: [M]

Invariance/Equivariance Error:
- Mean absolute: [value]
- Max absolute: [value]
- Mean relative: [value]
- Max relative: [value]
- RESULT: [PASS/FAIL]

LAYER-WISE ANALYSIS:
--------------------
[For each layer]
- Layer: [name]
- Error: [value]
- Result: [PASS/FAIL]

GRADIENT TEST:
--------------
- Gradient equivariance error: [value]
- RESULT: [PASS/FAIL]

IDENTIFIED ISSUES:
------------------
1. [Issue description]
   - Location: [layer/component]
   - Severity: [High/Medium/Low]
   - Recommended fix: [description]

OVERALL VERDICT: [PASS/FAIL/NEEDS_ATTENTION]

Recommendations:
- [List of actions needed]