文档源技能

理解文档平台以及如何高效地从中获取内容。支持20多个平台和框架。

变量

变量	默认值	描述
PREFER_LLMSTXT	true	在其他策略之前检查 llms.txt
PREFER_GITHUB	true	优先选择原始GitHub而非网页爬取
BROWSER_FALLBACK	true	当curl失败时使用浏览器自动化
DEFAULT_STRATEGY	web_crawl	当检测失败时的回退策略

指令

强制 - 在添加或分析文档源时遵循以下工作流程步骤。

始终首先检查AI原生信号（llms.txt）
优先选择原始内容而非渲染后的HTML
仅在必要时使用浏览器自动化

红旗 - 停止并重新考虑

如果您即将：

在未检查llms.txt的情况下进行网站爬取
在未尝试curl的情况下使用浏览器自动化
在未检测文档框架的情况下添加源
在有仓库可用时跳过GitHub原始访问

停止 -> 阅读相应的菜谱文件 -> 按照检测顺序进行 -> 然后继续

快速决策树

您要添加的文档是什么？
│
├─ 有/llms.txt？ ──────────────────────► llmstxt-strategy.md
│
├─ GitHub仓库可用？
│   └─ 有docs.yml/mint.json/mkdocs.yml？ ► github-strategy.md
│
├─ 有/openapi.json或/asyncapi.yaml？ ─► openapi-strategy.md
│
├─ 有/sitemap.xml？ ───────────────────► sitemap-strategy.md
│
├─ Curl返回<1KB？（JS渲染） ────► browser-strategy.md
│
└─ 其他都不工作？ ─────────────────► sitemap-strategy.md（爬取模式）

工作流程

[ ] 确定文档URL
[ ] 检查点：检查AI原生信号（llms.txt, ai.txt）
[ ] 检查GitHub仓库是否可用
[ ] 从配置文件中检测文档框架
[ ] 从菜谱中选择适当的策略
[ ] 检查点：在使用浏览器之前测试curl获取
[ ] 用正确的策略配置注册表项
[ ] 验证获取是否正确工作

策略优先级顺序

优先级	信号	策略	菜谱
1	`/llms.txt`存在	`llmstxt`	`cookbook/llmstxt-strategy.md`
2	GitHub仓库可用	`github_raw`	`cookbook/github-strategy.md`
3	OpenAPI/AsyncAPI规范	`openapi`	`cookbook/openapi-strategy.md`
4	`/sitemap.xml`存在	`web_sitemap`	`cookbook/sitemap-strategy.md`
5	JS渲染/curl失败	`browser_crawl`	`cookbook/browser-strategy.md`
6	其他都不工作	`web_crawl`	`cookbook/sitemap-strategy.md`

菜谱

llms.txt（AI原生）

IF: 网站有/llms.txt或/llms-full.txt
THEN: 阅读cookbook/llmstxt-strategy.md
CHECK: curl -sI {url}/llms.txt | head -1

GitHub Raw

IF: GitHub仓库从文档网站链接
THEN: 阅读cookbook/github-strategy.md
SUPPORTS: Fern, Docusaurus, MkDocs, Mintlify, Sphinx, Nextra, Starlight

OpenAPI / AsyncAPI / GraphQL

IF: 网站有API规范（/openapi.json, /asyncapi.yaml, /graphql）
THEN: 阅读cookbook/openapi-strategy.md

Sitemap / Web Crawl

IF: 网站有/sitemap.xml或需要爬取
THEN: 阅读cookbook/sitemap-strategy.md

浏览器自动化

IF: Curl失败（JS渲染，被阻止，<1KB响应）
THEN: 阅读cookbook/browser-strategy.md
ALSO: 见browser-discovery技能为IDE浏览器工具

快速检测命令

# 检查llms.txt
curl -sI "https://docs.viperjuice.dev/llms.txt" | head -1

# 检查sitemap
curl -sI "https://docs.viperjuice.dev/sitemap.xml" | head -1

# 检查OpenAPI
curl -sI "https://api.viperjuice.dev/openapi.json" | head -1

# 测试curl响应大小（检测JS渲染）
curl -s "https://docs.viperjuice.dev" | wc -c
# 如果< 1000字节，可能是JS渲染->使用浏览器

框架检测

框架	配置文件	典型位置
Fern	`docs.yml`	`fern/docs.yml`
Docusaurus	`docusaurus.config.js`	仓库根目录
MkDocs	`mkdocs.yml`	仓库根目录
Mintlify	`mint.json`	仓库根目录
Sphinx	`conf.py`	`docs/conf.py`
Nextra	`_meta.json`	`pages/_meta.json`
Starlight	`astro.config.mjs`	仓库根目录
Antora	`antora.yml`	仓库根目录
GitBook	`SUMMARY.md`	仓库根目录

见reference/framework-detection.md了解详细模式。

注册表配置

见reference/registry-examples.md了解配置模板。

基本结构：

{
  "source-id": {
    "name": "显示名称",
    "strategy": "策略名称",
    "paths": {
      "homepage": "https://..."
    }
  }
}

从清单自动发现

当与library-detection技能一起使用时，根据检测到的依赖自动建议文档源。

工作流程

从library-detection技能接收堆栈信息
将每个框架/库映射到已知的文档源
检查源是否已存在于ai-docs/libraries/_registry.json
为缺失的源建议/ai-dev-kit:docs-add命令

库到文档映射

库	文档URL	策略
react	https://react.dev	llmstxt
next	https://nextjs.org/docs	github_raw
vue	https://vuejs.org/guide	github_raw
nuxt	https://nuxt.com/docs	github_raw
svelte	https://svelte.dev/docs	github_raw
fastapi	https://fastapi.tiangolo.com	github_raw
django	https://docs.djangoproject.com	web_sitemap
flask	https://flask.palletsprojects.com	web_sitemap
express	https://expressjs.com	web_sitemap
prisma	https://www.prisma.io/docs	llmstxt
drizzle	https://orm.drizzle.team/docs	github_raw
vitest	https://vitest.dev	github_raw
playwright	https://playwright.dev/docs	github_raw
tailwindcss	https://tailwindcss.com/docs	llmstxt
trpc	https://trpc.io/docs	github_raw
zod	https://zod.dev	llmstxt

自动发现命令

当/ai-dev-kit:quickstart-codebase或手动请求时：

# 获取检测到的框架
FRAMEWORKS=$(library-detection --output json | jq -r '.frameworks[].name')

# 对于每个框架，检查文档是否存在
for fw in $FRAMEWORKS; do
  if ! grep -q "\"$fw\"" ai-docs/libraries/_registry.json; then
    echo "Missing docs for: $fw"
    echo "Suggest: /ai-dev-kit:docs-add <url> $fw"
  fi
done

输出

返回列表：

{
  "already_tracked": ["react", "typescript"],
  "missing": [
    {"library": "fastapi", "url": "https://fastapi.tiangolo.com", "command": "/ai-dev-kit:docs-add https://fastapi.tiangolo.com fastapi"}
  ],
  "unknown": ["custom-internal-lib"]
}

输出

添加源后验证：

页面正确发现
内容无错误获取
注册表项有效JSON