网页导航策略Skill web-navigation-strategies

这个技能提供使用Playwright MCP进行系统网页探索的策略和选择器指南,包括导航模式、深度决策框架、站点特定选择器和错误处理,适用于自动化测试、数据提取和网页内容分析。关键词:Playwright MCP, 网页导航, 选择器指南, 自动化测试, 数据提取, SEO搜索优化。

测试 0 次安装 0 次浏览 更新于 3/23/2026

name: web-navigation-strategies description: 使用Playwright MCP进行彻底网页探索的战略导航模式和选择器指南。提供决策树、导航策略和特定站点选择器,用于系统地阅读多个页面。在规划如何导航网站、确定阅读深度或找到Playwright MCP命令的正确选择器时使用。

Playwright MCP 的网页导航策略

使用Playwright MCP工具进行系统网页探索的战略指南。此技能提供导航模式,而非可执行代码。

阅读深度决策框架

1. 分析用户意图

查询分析决策树:
├── 关键词: "간단히, 요약, 훑어, quick, summary, overview"
│   → **快速模式**
│   → 提取: 标题 + 第一段仅
│   → 页面: 30-50 (快速扫描)
│
├── 关键词: "자세히, 상세, 모든, 댓글, detailed, comprehensive, everything"
│   → **深度模式**
│   → 提取: 完整内容 + 评论 + 元数据 + 相关链接
│   → 页面: 5-10 (彻底阅读)
│
└── 无特定关键词
    → **标准模式**
    → 提取: 主要内容 + 基本元数据
    → 页面: 15-20 (平衡)

2. 基于深度执行

快速模式策略

// 仅获取标题和摘要
mcp_playwright.navigate(url)
links = mcp_playwright.query_selector_all('h2 a, h3 a')
for (link in links.slice(0, 50)) {
    mcp_playwright.click(link)
    title = mcp_playwright.get_text('h1')
    summary = mcp_playwright.get_text('p:first-of-type')
    mcp_playwright.go_back()
}

标准模式策略

// 获取完整主要内容
mcp_playwright.navigate(url)
links = mcp_playwright.query_selector_all('article a')
for (link in links.slice(0, 20)) {
    mcp_playwright.click(link)
    mcp_playwright.wait_for_load_state('networkidle')
    content = mcp_playwright.get_text('main, article, .content')
    mcp_playwright.go_back()
}

深度模式策略

// 获取所有内容包括讨论
mcp_playwright.navigate(url)
links = mcp_playwright.query_selector_all('article a')
for (link in links.slice(0, 10)) {
    mcp_playwright.click(link)
    mcp_playwright.wait_for_load_state('networkidle')
    
    // 主要内容
    content = mcp_playwright.get_text('article, main')
    
    // 评论
    comments = mcp_playwright.query_selector_all('.comment')
    for (comment in comments) {
        comment_text = mcp_playwright.get_text(comment)
    }
    
    // 元数据
    author = mcp_playwright.get_text('.author, .writer')
    date = mcp_playwright.get_text('time, .date')
    
    // 相关链接
    related = mcp_playwright.query_selector_all('.related a')
    
    mcp_playwright.go_back()
}

核心导航模式

模式1: 列表-详情导航

适用于: 博客索引、搜索结果、新闻列表

逐步过程:
1. 导航到列表页面
2. 收集所有文章链接
3. 对于每个链接:
   a. 点击链接
   b. 等待页面加载
   c. 提取内容
   d. 导航返回
   e. 继续下一个

MCP命令:

// 导航到列表
mcp_playwright.navigate('https://blog.com/posts')

// 获取所有文章链接
article_links = mcp_playwright.query_selector_all('article a[href]')

// 处理每个
for (link in article_links) {
    // 进入详情页面
    mcp_playwright.click(link)
    mcp_playwright.wait_for_load_state('networkidle')
    
    // 提取内容
    text = mcp_playwright.get_text('article')
    
    // 返回到列表
    mcp_playwright.go_back()
    mcp_playwright.wait_for_load_state('networkidle')
}

模式2: 分页遍历

适用于: 多页结果、论坛、存档

过程:
1. 处理当前页面
2. 找到“下一页”按钮
3. 点击并重复
4. 当没有更多页面时停止

MCP命令:

while (true) {
    // 处理当前页面
    articles = mcp_playwright.query_selector_all('article')
    for (article in articles) {
        // 提取内容
    }
    
    // 检查下一页
    next_button = mcp_playwright.query_selector('a.next, [rel="next"]')
    if (!next_button) break
    
    // 前往下一页
    mcp_playwright.click(next_button)
    mcp_playwright.wait_for_load_state('networkidle')
}

模式3: 无限滚动

适用于: 社交媒体、现代博客、动态内容

// 滚动和加载模式
current_height = 0
while (true) {
    // 获取当前高度
    new_height = mcp_playwright.evaluate('document.body.scrollHeight')
    
    if (new_height == current_height) break
    
    // 滚动到底部
    mcp_playwright.evaluate('window.scrollTo(0, document.body.scrollHeight)')
    mcp_playwright.wait_for_timeout(2000)  // 等待内容加载
    
    current_height = new_height
}

// 然后提取所有加载的内容
all_articles = mcp_playwright.query_selector_all('article')

站点特定选择器模式

韩国门户

Naver (네이버)

// 搜索结果
list_selector: '.blog_list li'
link_selector: '.api_txt_lines.total_tit'
content_selector: '.se-main-container'
comment_selector: '.u_cbox_contents'

// 博客特定
blog_content: '#postViewArea, .se-main-container'
blog_title: '.se-title-text, .pcol1'

Daum (다음)

// 搜索结果  
list_selector: '.list_info'
link_selector: '.f_link_b, .tit_main'
content_selector: '.article_view'

// Tistory博客
tistory_content: '.entry-content, .article-view'
tistory_comments: '.comment-list'

Brunch

article_list: '.wrap_article_list li'
article_link: 'a.link_post'
article_content: '.wrap_body'

全球站点

Medium

article_list: 'article'
article_link: 'h2 a, h3 a'
article_content: 'section.pw-post-body'
clap_count: '[aria-label*="clap"]'

Reddit

post_list: '[data-testid="post-container"]'
post_content: '[data-click-id="text"]'
comments: '.Comment'
upvotes: '[aria-label*="upvote"]'

通用模式

// 适用于大多数博客
article_containers: 'article, .post, .entry, .blog-post'
title_selectors: 'h1, .title, .post-title, .entry-title'
content_selectors: 'main, .content, .post-content, .entry-content'
comment_selectors: '.comments, .comment-list, #comments'
author_selectors: '.author, .by-author, .writer, [rel="author"]'
date_selectors: 'time, .date, .published, .post-date'

智能导航决策

何时使用每种模式

场景 模式 深度 示例
“뉴스 헤드라인 훑어봐” 列表-详情 快速 50个标题
“블로그 글 10개 읽어줘” 列表-详情 标准 完整内容
“포럼 댓글까지 분석해” 列表-详情 深度 带评论
“모든 페이지 다 봐” 分页 标准 所有页面
“인스타 피드 쭉 봐” 无限滚动 快速 社交媒体

选择器优先级策略

1. 首先尝试特定选择器
   → '.se-main-container' (Naver特定)
   
2. 回退到语义HTML
   → 'article', 'main'
   
3. 尝试常见类模式
   → '.content', '.post-content'
   
4. 最后手段: 获取所有文本
   → 'body' (然后清理)

MCP命令参考

基本命令

// 导航
mcp_playwright.navigate(url)
mcp_playwright.go_back()
mcp_playwright.go_forward()
mcp_playwright.reload()

// 元素选择
mcp_playwright.query_selector(selector)        // 第一个匹配
mcp_playwright.query_selector_all(selector)    // 所有匹配

// 交互
mcp_playwright.click(element_or_selector)
mcp_playwright.type(selector, text)
mcp_playwright.scroll_to(x, y)

// 内容提取
mcp_playwright.get_text(selector)
mcp_playwright.get_attribute(selector, attribute)
mcp_playwright.get_url()
mcp_playwright.get_title()

// 等待
mcp_playwright.wait_for_selector(selector)
mcp_playwright.wait_for_load_state('networkidle')
mcp_playwright.wait_for_timeout(milliseconds)

// JavaScript执行
mcp_playwright.evaluate(javascript_code)

实践示例

示例1: 快速新闻扫描

// 用户: "네이버 뉴스 제목만 빠르게 훑어줘"
mcp_playwright.navigate('https://news.naver.com')
headlines = mcp_playwright.query_selector_all('.news_tit')
for (headline in headlines.slice(0, 30)) {
    title = mcp_playwright.get_text(headline)
    // 存储标题
}

示例2: 博客深度探索

// 用户: "이 블로그 최신글 5개 댓글까지 자세히 봐줘"
mcp_playwright.navigate('https://blog.example.com')
posts = mcp_playwright.query_selector_all('.post-link')

for (post in posts.slice(0, 5)) {
    mcp_playwright.click(post)
    
    // 获取所有内容
    title = mcp_playwright.get_text('h1')
    content = mcp_playwright.get_text('article')
    author = mcp_playwright.get_text('.author')
    date = mcp_playwright.get_text('.date')
    
    // 获取所有评论
    comments = mcp_playwright.query_selector_all('.comment')
    for (comment in comments) {
        text = mcp_playwright.get_text(comment)
    }
    
    mcp_playwright.go_back()
}

示例3: 论坛线程分析

// 用户: "레딧 스레드 전체 토론 분석해줘"
mcp_playwright.navigate(reddit_url)

// 主帖子
main_post = mcp_playwright.get_text('[data-click-id="text"]')
upvotes = mcp_playwright.get_text('[aria-label*="upvote"]')

// 所有评论包括嵌套回复
all_comments = mcp_playwright.query_selector_all('.Comment')
for (comment in all_comments) {
    text = mcp_playwright.get_text(comment)
    // 解析嵌套结构
}

错误处理策略

常见问题和解决方案

  1. 选择器未找到

    // 尝试多个选择器
    content = mcp_playwright.query_selector('article') 
            || mcp_playwright.query_selector('main')
            || mcp_playwright.query_selector('.content')
            || mcp_playwright.query_selector('body')
    
  2. 动态内容未加载

    // 等待策略
    mcp_playwright.wait_for_load_state('networkidle')
    mcp_playwright.wait_for_selector('.content', timeout=10000)
    mcp_playwright.wait_for_timeout(2000)  // 最后手段
    
  3. 导航失败

    try {
        mcp_playwright.go_back()
    } catch {
        // 回退: 直接导航
        mcp_playwright.navigate(list_page_url)
    }
    

最佳实践

做 ✅

  • 等待内容加载后再提取
  • 可能时使用特定选择器
  • 处理分页和无限滚动
  • 提取元数据(作者、日期)当可用时
  • 尊重速率限制,在请求之间等待

不做 ❌

  • 不要假设选择器适用于所有站点
  • 不要跳过 wait_for_load_state
  • 不要提取而不检查元素是否存在
  • 不要忽略错误处理
  • 不要在深度模式下处理太多页面

决策流程总结

用户请求
    ↓
分析关键词 → 确定深度 (快速/标准/深度)
    ↓
识别站点类型 → 选择选择器
    ↓
选择模式 → (列表-详情/分页/滚动)
    ↓
用MCP命令执行
    ↓
处理错误 → 回退策略

参考

有关按站点类型的详细模式和选择器,请参见: