链接验证器 link-validator

链接验证器是一个用于文档质量控制的自动化工具,专门进行链接检查与验证。它能够全面扫描文档中的内部链接和外部URL,验证锚点引用,检测重定向,监控链接失效情况,并提供站点地图验证功能。该工具支持自动化报告生成,集成Archive.org备用链接建议,适用于技术文档维护、网站管理、内容质量保障等场景。关键词:链接检查、URL验证、文档维护、自动化测试、链接失效监控、站点地图验证、技术文档、质量保障。

DevOps 0 次安装 0 次浏览 更新于 2/26/2026

name: link-validator description: 文档链接检查与验证。验证内部链接、外部URL、锚点、检测重定向、监控链接失效、生成站点地图验证报告。 allowed-tools: Read, Write, Edit, Bash, Glob, Grep backlog-id: SK-009 metadata: author: babysitter-sdk version: “1.0.0”

链接验证技能

文档的全面链接检查与验证。

功能

  • 内部链接验证(交叉引用)
  • 带重试逻辑的外部URL检查
  • 锚点/片段验证
  • 重定向检测与更新
  • 链接失效监控与报告
  • Archive.org 备用建议
  • sitemap.xml 验证
  • 链接可访问性检查

使用场景

在以下情况调用此技能:

  • 验证文档中的所有链接
  • 检查损坏的外部URL
  • 验证锚点引用
  • 检测并修复重定向
  • 长期监控链接健康状况

输入参数

参数 类型 是否必需 描述
inputPath string 文档目录路径
action string validate, monitor, fix-redirects
checkExternal boolean 检查外部URL(默认:true)
timeout number 请求超时时间(秒)
retries number 失败请求的重试次数
allowedDomains array 始终允许的域名
blockedDomains array 跳过检查的域名

输入示例

{
  "inputPath": "./docs",
  "action": "validate",
  "checkExternal": true,
  "timeout": 30,
  "retries": 3
}

输出结构

验证报告

{
  "summary": {
    "total": 342,
    "valid": 325,
    "broken": 12,
    "redirected": 5,
    "skipped": 0
  },
  "internal": {
    "total": 180,
    "valid": 178,
    "broken": 2
  },
  "external": {
    "total": 162,
    "valid": 147,
    "broken": 10,
    "redirected": 5
  },
  "issues": [
    {
      "type": "broken",
      "url": "https://api.example.com/v1/docs",
      "status": 404,
      "source": {
        "file": "docs/api/authentication.md",
        "line": 42,
        "text": "[API Documentation](https://api.example.com/v1/docs)"
      },
      "suggestion": {
        "archived": "https://web.archive.org/web/20250101/https://api.example.com/v1/docs",
        "alternative": null
      }
    },
    {
      "type": "redirect",
      "url": "http://example.com/old-page",
      "redirectTo": "https://example.com/new-page",
      "status": 301,
      "source": {
        "file": "docs/guides/migration.md",
        "line": 15
      },
      "suggestion": "更新为:https://example.com/new-page"
    },
    {
      "type": "anchor-missing",
      "url": "api/users.md#create-user",
      "source": {
        "file": "docs/quickstart.md",
        "line": 28
      },
      "suggestion": "未找到标题'create-user'。可用标题:create, update, delete"
    }
  ],
  "performance": {
    "duration": 45.2,
    "requestsMade": 162,
    "avgResponseTime": 245
  }
}

配置

linkcheck.config.json

{
  "input": "./docs",
  "output": "./reports/linkcheck.json",
  "options": {
    "checkExternal": true,
    "checkAnchors": true,
    "checkImages": true,
    "followRedirects": true,
    "timeout": 30000,
    "retries": 3,
    "retryDelay": 1000,
    "concurrency": 10,
    "userAgent": "Mozilla/5.0 LinkChecker/1.0"
  },
  "allowed": {
    "statusCodes": [200, 201, 204],
    "domains": ["localhost", "127.0.0.1"],
    "patterns": ["^https://internal\\.example\\.com"]
  },
  "blocked": {
    "domains": ["archive.org"],
    "patterns": ["^https://twitter\\.com"]
  },
  "replacements": {
    "http://example.com": "https://example.com",
    "/docs/v1/": "/docs/v2/"
  }
}

链接类型

内部链接

<!-- 相对路径链接 -->
[入门指南](./getting-started.md)
[API参考](../api/index.md)

<!-- 锚点链接 -->
[配置](#configuration)
[API用户](./api/users.md#create-user)

<!-- 图片链接 -->
![架构图](./images/architecture.png)

外部链接

<!-- 标准外部链接 -->
[GitHub](https://github.com)
[文档](https://docs.example.com/guide)

<!-- 带锚点的链接 -->
[MDN数组](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array#instance_methods)

验证规则

内部链接规则

const internalRules = {
  // 文件必须存在
  fileExists: {
    severity: 'error',
    check: (link, context) => {
      const resolvedPath = resolvePath(link, context.file);
      return fs.existsSync(resolvedPath);
    }
  },

  // 锚点必须在目标文件中存在
  anchorExists: {
    severity: 'error',
    check: (link, context) => {
      const [file, anchor] = link.split('#');
      if (!anchor) return true;
      const headings = extractHeadings(file);
      return headings.some(h => slugify(h) === anchor);
    }
  },

  // 大小写敏感性
  caseSensitive: {
    severity: 'warning',
    check: (link, context) => {
      const actual = findActualPath(link);
      return link === actual;
    }
  }
};

外部链接规则

const externalRules = {
  // URL必须返回成功状态
  statusOk: {
    severity: 'error',
    check: async (url) => {
      const response = await fetch(url, { method: 'HEAD' });
      return response.ok;
    }
  },

  // 优先使用HTTPS
  httpsPreferred: {
    severity: 'warning',
    check: (url) => {
      return url.startsWith('https://') || isLocalhost(url);
    }
  },

  // 无重定向(或更新到最终URL)
  noRedirects: {
    severity: 'info',
    check: async (url) => {
      const response = await fetch(url, { redirect: 'manual' });
      return !response.headers.get('location');
    }
  }
};

链接失效监控

定时检查

# .github/workflows/link-check.yml
name: 链接检查

on:
  schedule:
    - cron: '0 0 * * 0'  # 每周日
  workflow_dispatch:

jobs:
  check-links:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: 检查链接
        uses: lycheeverse/lychee-action@v1
        with:
          args: --verbose --no-progress './docs/**/*.md'
          fail: true

      - name: 失败时创建问题
        if: failure()
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.create({
              owner: context.repo.owner,
              repo: context.repo.repo,
              title: '检测到损坏链接',
              body: '每周链接检查发现损坏链接。详情请查看工作流运行记录。',
              labels: ['documentation', 'bug']
            })

历史追踪

{
  "history": [
    {
      "date": "2026-01-24",
      "total": 342,
      "broken": 12,
      "new_broken": 3,
      "fixed": 1
    },
    {
      "date": "2026-01-17",
      "total": 340,
      "broken": 10,
      "new_broken": 2,
      "fixed": 0
    }
  ],
  "trends": {
    "avg_broken_per_week": 2.5,
    "most_problematic_domains": [
      { "domain": "api.example.com", "broken_count": 5 },
      { "domain": "old-docs.example.com", "broken_count": 3 }
    ]
  }
}

Archive.org 集成

备用建议

async function findArchiveUrl(brokenUrl) {
  const archiveApi = `https://archive.org/wayback/available?url=${encodeURIComponent(brokenUrl)}`;

  try {
    const response = await fetch(archiveApi);
    const data = await response.json();

    if (data.archived_snapshots?.closest) {
      return {
        available: true,
        url: data.archived_snapshots.closest.url,
        timestamp: data.archived_snapshots.closest.timestamp
      };
    }
  } catch (error) {
    // Archive.org 不可用
  }

  return { available: false };
}

站点地图验证

sitemap.xml 检查

async function validateSitemap(sitemapUrl) {
  const response = await fetch(sitemapUrl);
  const xml = await response.text();
  const urls = parseSitemapXml(xml);

  const results = await Promise.all(
    urls.map(async (url) => {
      const check = await checkUrl(url.loc);
      return {
        url: url.loc,
        lastmod: url.lastmod,
        status: check.status,
        valid: check.valid
      };
    })
  );

  return {
    total: urls.length,
    valid: results.filter(r => r.valid).length,
    invalid: results.filter(r => !r.valid),
    missingLastmod: results.filter(r => !r.lastmod).length
  };
}

工作流程

  1. 扫描文件 - 查找所有Markdown文件
  2. 提取链接 - 解析内部和外部链接
  3. 验证内部链接 - 检查文件和锚点是否存在
  4. 验证外部链接 - 带重试的HTTP请求
  5. 检查锚点 - 验证片段标识符
  6. 检测重定向 - 记录永久重定向
  7. 生成报告 - 输出发现和建议

依赖项

{
  "devDependencies": {
    "linkinator": "^6.0.0",
    "markdown-link-check": "^3.11.0",
    "lychee": "^0.14.0",
    "node-fetch": "^3.3.0"
  }
}

CLI 命令

# 检查所有链接
npx linkinator ./docs --recurse --format json > report.json

# 使用 markdown-link-check 检查
find docs -name '*.md' -exec npx markdown-link-check {} \;

# 使用 lychee(基于Rust,速度快)
lychee './docs/**/*.md' --format json --output report.json

# 自动修复重定向
node scripts/fix-redirects.js --input docs/ --report report.json

最佳实践

  • 在CI/CD中运行链接检查
  • 每周监控外部链接
  • 及时更新重定向链接
  • 内部引用使用相对链接
  • 重要链接包含archive.org备用链接
  • 白名单已知良好的域名

参考资料

目标流程

  • docs-testing.js
  • docs-audit.js
  • docs-pr-workflow.js