name: web-navigation-strategies description: 使用Playwright MCP进行彻底网页探索的战略导航模式和选择器指南。提供决策树、导航策略和特定站点选择器,用于系统地阅读多个页面。在规划如何导航网站、确定阅读深度或找到Playwright MCP命令的正确选择器时使用。
Playwright MCP 的网页导航策略
使用Playwright MCP工具进行系统网页探索的战略指南。此技能提供导航模式,而非可执行代码。
阅读深度决策框架
1. 分析用户意图
查询分析决策树:
├── 关键词: "간단히, 요약, 훑어, quick, summary, overview"
│ → **快速模式**
│ → 提取: 标题 + 第一段仅
│ → 页面: 30-50 (快速扫描)
│
├── 关键词: "자세히, 상세, 모든, 댓글, detailed, comprehensive, everything"
│ → **深度模式**
│ → 提取: 完整内容 + 评论 + 元数据 + 相关链接
│ → 页面: 5-10 (彻底阅读)
│
└── 无特定关键词
→ **标准模式**
→ 提取: 主要内容 + 基本元数据
→ 页面: 15-20 (平衡)
2. 基于深度执行
快速模式策略
// 仅获取标题和摘要
mcp_playwright.navigate(url)
links = mcp_playwright.query_selector_all('h2 a, h3 a')
for (link in links.slice(0, 50)) {
mcp_playwright.click(link)
title = mcp_playwright.get_text('h1')
summary = mcp_playwright.get_text('p:first-of-type')
mcp_playwright.go_back()
}
标准模式策略
// 获取完整主要内容
mcp_playwright.navigate(url)
links = mcp_playwright.query_selector_all('article a')
for (link in links.slice(0, 20)) {
mcp_playwright.click(link)
mcp_playwright.wait_for_load_state('networkidle')
content = mcp_playwright.get_text('main, article, .content')
mcp_playwright.go_back()
}
深度模式策略
// 获取所有内容包括讨论
mcp_playwright.navigate(url)
links = mcp_playwright.query_selector_all('article a')
for (link in links.slice(0, 10)) {
mcp_playwright.click(link)
mcp_playwright.wait_for_load_state('networkidle')
// 主要内容
content = mcp_playwright.get_text('article, main')
// 评论
comments = mcp_playwright.query_selector_all('.comment')
for (comment in comments) {
comment_text = mcp_playwright.get_text(comment)
}
// 元数据
author = mcp_playwright.get_text('.author, .writer')
date = mcp_playwright.get_text('time, .date')
// 相关链接
related = mcp_playwright.query_selector_all('.related a')
mcp_playwright.go_back()
}
核心导航模式
模式1: 列表-详情导航
适用于: 博客索引、搜索结果、新闻列表
逐步过程:
1. 导航到列表页面
2. 收集所有文章链接
3. 对于每个链接:
a. 点击链接
b. 等待页面加载
c. 提取内容
d. 导航返回
e. 继续下一个
MCP命令:
// 导航到列表
mcp_playwright.navigate('https://blog.com/posts')
// 获取所有文章链接
article_links = mcp_playwright.query_selector_all('article a[href]')
// 处理每个
for (link in article_links) {
// 进入详情页面
mcp_playwright.click(link)
mcp_playwright.wait_for_load_state('networkidle')
// 提取内容
text = mcp_playwright.get_text('article')
// 返回到列表
mcp_playwright.go_back()
mcp_playwright.wait_for_load_state('networkidle')
}
模式2: 分页遍历
适用于: 多页结果、论坛、存档
过程:
1. 处理当前页面
2. 找到“下一页”按钮
3. 点击并重复
4. 当没有更多页面时停止
MCP命令:
while (true) {
// 处理当前页面
articles = mcp_playwright.query_selector_all('article')
for (article in articles) {
// 提取内容
}
// 检查下一页
next_button = mcp_playwright.query_selector('a.next, [rel="next"]')
if (!next_button) break
// 前往下一页
mcp_playwright.click(next_button)
mcp_playwright.wait_for_load_state('networkidle')
}
模式3: 无限滚动
适用于: 社交媒体、现代博客、动态内容
// 滚动和加载模式
current_height = 0
while (true) {
// 获取当前高度
new_height = mcp_playwright.evaluate('document.body.scrollHeight')
if (new_height == current_height) break
// 滚动到底部
mcp_playwright.evaluate('window.scrollTo(0, document.body.scrollHeight)')
mcp_playwright.wait_for_timeout(2000) // 等待内容加载
current_height = new_height
}
// 然后提取所有加载的内容
all_articles = mcp_playwright.query_selector_all('article')
站点特定选择器模式
韩国门户
Naver (네이버)
// 搜索结果
list_selector: '.blog_list li'
link_selector: '.api_txt_lines.total_tit'
content_selector: '.se-main-container'
comment_selector: '.u_cbox_contents'
// 博客特定
blog_content: '#postViewArea, .se-main-container'
blog_title: '.se-title-text, .pcol1'
Daum (다음)
// 搜索结果
list_selector: '.list_info'
link_selector: '.f_link_b, .tit_main'
content_selector: '.article_view'
// Tistory博客
tistory_content: '.entry-content, .article-view'
tistory_comments: '.comment-list'
Brunch
article_list: '.wrap_article_list li'
article_link: 'a.link_post'
article_content: '.wrap_body'
全球站点
Medium
article_list: 'article'
article_link: 'h2 a, h3 a'
article_content: 'section.pw-post-body'
clap_count: '[aria-label*="clap"]'
post_list: '[data-testid="post-container"]'
post_content: '[data-click-id="text"]'
comments: '.Comment'
upvotes: '[aria-label*="upvote"]'
通用模式
// 适用于大多数博客
article_containers: 'article, .post, .entry, .blog-post'
title_selectors: 'h1, .title, .post-title, .entry-title'
content_selectors: 'main, .content, .post-content, .entry-content'
comment_selectors: '.comments, .comment-list, #comments'
author_selectors: '.author, .by-author, .writer, [rel="author"]'
date_selectors: 'time, .date, .published, .post-date'
智能导航决策
何时使用每种模式
| 场景 | 模式 | 深度 | 示例 |
|---|---|---|---|
| “뉴스 헤드라인 훑어봐” | 列表-详情 | 快速 | 50个标题 |
| “블로그 글 10개 읽어줘” | 列表-详情 | 标准 | 完整内容 |
| “포럼 댓글까지 분석해” | 列表-详情 | 深度 | 带评论 |
| “모든 페이지 다 봐” | 分页 | 标准 | 所有页面 |
| “인스타 피드 쭉 봐” | 无限滚动 | 快速 | 社交媒体 |
选择器优先级策略
1. 首先尝试特定选择器
→ '.se-main-container' (Naver特定)
2. 回退到语义HTML
→ 'article', 'main'
3. 尝试常见类模式
→ '.content', '.post-content'
4. 最后手段: 获取所有文本
→ 'body' (然后清理)
MCP命令参考
基本命令
// 导航
mcp_playwright.navigate(url)
mcp_playwright.go_back()
mcp_playwright.go_forward()
mcp_playwright.reload()
// 元素选择
mcp_playwright.query_selector(selector) // 第一个匹配
mcp_playwright.query_selector_all(selector) // 所有匹配
// 交互
mcp_playwright.click(element_or_selector)
mcp_playwright.type(selector, text)
mcp_playwright.scroll_to(x, y)
// 内容提取
mcp_playwright.get_text(selector)
mcp_playwright.get_attribute(selector, attribute)
mcp_playwright.get_url()
mcp_playwright.get_title()
// 等待
mcp_playwright.wait_for_selector(selector)
mcp_playwright.wait_for_load_state('networkidle')
mcp_playwright.wait_for_timeout(milliseconds)
// JavaScript执行
mcp_playwright.evaluate(javascript_code)
实践示例
示例1: 快速新闻扫描
// 用户: "네이버 뉴스 제목만 빠르게 훑어줘"
mcp_playwright.navigate('https://news.naver.com')
headlines = mcp_playwright.query_selector_all('.news_tit')
for (headline in headlines.slice(0, 30)) {
title = mcp_playwright.get_text(headline)
// 存储标题
}
示例2: 博客深度探索
// 用户: "이 블로그 최신글 5개 댓글까지 자세히 봐줘"
mcp_playwright.navigate('https://blog.example.com')
posts = mcp_playwright.query_selector_all('.post-link')
for (post in posts.slice(0, 5)) {
mcp_playwright.click(post)
// 获取所有内容
title = mcp_playwright.get_text('h1')
content = mcp_playwright.get_text('article')
author = mcp_playwright.get_text('.author')
date = mcp_playwright.get_text('.date')
// 获取所有评论
comments = mcp_playwright.query_selector_all('.comment')
for (comment in comments) {
text = mcp_playwright.get_text(comment)
}
mcp_playwright.go_back()
}
示例3: 论坛线程分析
// 用户: "레딧 스레드 전체 토론 분석해줘"
mcp_playwright.navigate(reddit_url)
// 主帖子
main_post = mcp_playwright.get_text('[data-click-id="text"]')
upvotes = mcp_playwright.get_text('[aria-label*="upvote"]')
// 所有评论包括嵌套回复
all_comments = mcp_playwright.query_selector_all('.Comment')
for (comment in all_comments) {
text = mcp_playwright.get_text(comment)
// 解析嵌套结构
}
错误处理策略
常见问题和解决方案
-
选择器未找到
// 尝试多个选择器 content = mcp_playwright.query_selector('article') || mcp_playwright.query_selector('main') || mcp_playwright.query_selector('.content') || mcp_playwright.query_selector('body') -
动态内容未加载
// 等待策略 mcp_playwright.wait_for_load_state('networkidle') mcp_playwright.wait_for_selector('.content', timeout=10000) mcp_playwright.wait_for_timeout(2000) // 最后手段 -
导航失败
try { mcp_playwright.go_back() } catch { // 回退: 直接导航 mcp_playwright.navigate(list_page_url) }
最佳实践
做 ✅
- 等待内容加载后再提取
- 可能时使用特定选择器
- 处理分页和无限滚动
- 提取元数据(作者、日期)当可用时
- 尊重速率限制,在请求之间等待
不做 ❌
- 不要假设选择器适用于所有站点
- 不要跳过 wait_for_load_state
- 不要提取而不检查元素是否存在
- 不要忽略错误处理
- 不要在深度模式下处理太多页面
决策流程总结
用户请求
↓
分析关键词 → 确定深度 (快速/标准/深度)
↓
识别站点类型 → 选择选择器
↓
选择模式 → (列表-详情/分页/滚动)
↓
用MCP命令执行
↓
处理错误 → 回退策略
参考
有关按站点类型的详细模式和选择器,请参见:
- references/site-selectors.md - 全面的选择器数据库
- references/navigation-patterns.md - 高级导航技术