名称: phoenix-playwright-tests 描述: 为 Phoenix AI 可观测性平台编写 Playwright 端到端测试。在创建、更新或调试 Playwright 测试时使用，或当用户询问测试 UI 功能、编写 E2E 测试或为 Phoenix 自动化浏览器交互时使用。元数据: 内部: true

Phoenix Playwright 测试编写

使用 Playwright 为 Phoenix 编写端到端测试。测试位于 app/tests/ 并遵循既定模式。

超时策略

不要在 app/tests 下的测试代码中传递超时参数。
在 app/playwright.config.ts 中集中调整时间（全局 timeout、expect.timeout、use.navigationTimeout 和 webServer.timeout）。

快速开始

import { expect, test } from "@playwright/test";
import { randomUUID } from "crypto";

test.describe("功能名称", () => {
  test.beforeEach(async ({ page }) => {
    await page.goto(`/login`);
    await page.getByLabel("邮箱").fill("admin@localhost");
    await page.getByLabel("密码").fill("admin123");
    await page.getByRole("button", { name: "登录", exact: true }).click();
    await page.waitForURL("**/projects");
  });

  test("可以执行某些操作", async ({ page }) => {
    // 测试实现
  });
});

测试凭证

用户	邮箱	密码	角色
管理员	admin@localhost	admin123	admin
成员	member@localhost.com	member123	member
查看者	viewer@localhost.com	viewer123	viewer

选择器模式（优先级顺序）

角色选择器（最稳健）：

page.getByRole("button", { name: "保存" });
page.getByRole("link", { name: "数据集" });
page.getByRole("tab", { name: /评估器/i });
page.getByRole("menuitem", { name: "编辑" });
page.getByRole("cell", { name: "我的项目" });
page.getByRole("heading", { name: "标题" });
page.getByRole("dialog");
page.getByRole("textbox", { name: "名称" });
page.getByRole("combobox", { name: /映射/i });

标签选择器：

page.getByLabel("邮箱");
page.getByLabel("数据集名称");
page.getByLabel("描述");

文本选择器：

page.getByText("未添加评估器");
page.getByPlaceholder("搜索...");

测试 ID（可用时）：
```
page.getByTestId("模态框");
```

CSS 定位器（最后手段）：

page.locator('button:has-text("保存")');

常见 UI 模式

下拉菜单

// 点击按钮打开下拉菜单
await page.getByRole("button", { name: "新建数据集" }).click();
// 选择菜单项
await page.getByRole("menuitem", { name: "新建数据集" }).click();

嵌套菜单（子菜单）

// 打开菜单，悬停在子菜单触发器上，点击子菜单项
await page.getByRole("button", { name: "添加评估器" }).click();
await page
  .getByRole("menuitem", { name: "使用 LLM 评估器模板" })
  .hover();
await page.getByRole("menuitem", { name: /正确性/i }).click();

// 重要：始终使用 getByRole("menuitem") 用于子菜单项，而不是 getByText()
// Playwright 的自动等待处理子菜单出现时间
// ❌ 不好 - 在 CI 中不稳定：
// await page.getByText("精确匹配").first().click();
// ✅ 好 - 可靠：
// await page.getByRole("menuitem", { name: /精确匹配/i }).click();

对话框/模态框

// 等待对话框
await expect(page.getByRole("dialog")).toBeVisible();
// 在对话框中填写表单
await page.getByLabel("名称").fill("测试名称");
// 提交
await page.getByRole("button", { name: "创建" }).click();
// 等待关闭
await expect(page.getByRole("dialog")).not.toBeVisible();

带有行操作的表

// 通过单元格内容查找行
const row = page.getByRole("row").filter({
  has: page.getByRole("cell", { name: "项目名称" }),
});
// 点击行中的操作按钮（通常是最后一个按钮）
await row.getByRole("button").last().click();
// 从菜单中选择操作
await page.getByRole("menuitem", { name: "编辑" }).click();

选项卡

await page.getByRole("tab", { name: /评估器/i }).click();
await page.waitForURL("**/evaluators");
await expect(page.getByRole("tab", { name: /评估器/i })).toHaveAttribute(
  "aria-selected",
  "true",
);

部分中的表单输入

// 当存在多个文本框时，限定到部分
const systemSection = page.locator('button:has-text("系统")');
const systemTextbox = systemSection
  .locator("..")
  .locator("..")
  .getByRole("textbox");
await systemTextbox.fill("内容");

串行测试（共享状态）

当测试相互依赖时，使用 test.describe.serial：

test.describe.serial("工作流", () => {
  const itemName = `项目-${randomUUID()}`;

  test("步骤 1: 创建项目", async ({ page }) => {
    // 创建 itemName
  });

  test("步骤 2: 编辑项目", async ({ page }) => {
    // 使用先前测试中的 itemName
  });

  test("步骤 3: 验证编辑", async ({ page }) => {
    // 验证 itemName 已被编辑
  });
});

断言

// 可见性
await expect(element).toBeVisible();
await expect(element).not.toBeVisible();

// 文本内容
await expect(element).toHaveText("预期值");
await expect(element).toContainText("部分");

// 属性
await expect(element).toHaveAttribute("aria-selected", "true");

// 输入值
await expect(input).toHaveValue("预期值");

// URL
await page.waitForURL("**/datasets/**/examples");

导航模式

// 直接导航
await page.goto("/datasets");
await page.waitForURL("**/datasets");

// 点击导航
await page.getByRole("link", { name: "数据集" }).click();
await page.waitForURL("**/datasets");

// 从 URL 提取 ID
const url = page.url();
const match = url.match(/datasets\/([^/]+)/);
const datasetId = match ? match[1] : "";

// 带有查询参数的导航
await page.goto(`/playground?datasetId=${datasetId}`);

运行测试

运行 Playwright 测试前，构建应用以便 E2E 针对最新的前端更改运行：

pnpm run build

# 运行特定测试文件
pnpm exec playwright test tests/server-evaluators.spec.ts --project=chromium

# 使用 UI 模式运行
pnpm exec playwright test --ui

# 按名称运行特定测试
pnpm exec playwright test -g "可以创建"

# 调试模式
pnpm exec playwright test --debug

避免交互式报告服务器

默认情况下，Playwright 在测试完成后提供 HTML 报告并等待 Ctrl+C，这可能导致命令超时。使用这些选项避免此问题：

# 使用列表报告器（无交互式服务器）
pnpm exec playwright test tests/example.spec.ts --project=chromium --reporter=list

# 使用点报告器以最小输出
pnpm exec playwright test tests/example.spec.ts --project=chromium --reporter=dot

# 设置 CI 模式以禁用交互功能
CI=1 pnpm exec playwright test tests/example.spec.ts --project=chromium

自动化推荐：在程序化运行测试时，始终使用 --reporter=list 或 CI=1 以确保命令在测试完成后干净退出。

Phoenix 特定页面

页面	URL 模式	关键元素
数据集	`/datasets`	表、“新建数据集”按钮
数据集详情	`/datasets/{id}/examples`	选项卡（实验、示例、评估器、版本）
数据集评估器	`/datasets/{id}/evaluators`	“添加评估器”按钮、评估器表
游乐场	`/playground`	提示部分、实验部分
游乐场 + 数据集	`/playground?datasetId={id}`	数据集选择器、评估器按钮
提示	`/prompts`	“新建提示”按钮、提示表
设置	`/settings/general`	“添加用户”按钮、用户表

使用 agent-browser 探索 UI

当选择器不清晰时，使用 agent-browser 探索 Phoenix UI。有关详细 agent-browser 用法，调用 /agent-browser 技能。

Phoenix 快速参考

# 打开 Phoenix 页面（开发服务器运行在端口 6006）
agent-browser open "http://localhost:6006/datasets"

# 获取带有元素引用的交互式快照
agent-browser snapshot -i

# 使用快照中的引用点击
agent-browser click @e5

# 填写表单字段
agent-browser fill @e2 "测试值"

# 获取元素文本
agent-browser get text @e1

发现选择器工作流

打开页面：agent-browser open "http://localhost:6006/datasets"
获取快照：agent-browser snapshot -i
在输出中查找元素引用（例如，@e1 [button] "新建数据集"）
交互：agent-browser click @e1
导航/DOM 更改后重新快照：agent-browser snapshot -i

转换为 Playwright

agent-browser 输出	Playwright 选择器
`@e1 [button] "保存"`	`page.getByRole("button", { name: "保存" })`
`@e2 [link] "数据集"`	`page.getByRole("link", { name: "数据集" })`
`@e3 [textbox] "名称"`	`page.getByRole("textbox", { name: "名称" })`
`@e4 [menuitem] "编辑"`	`page.getByRole("menuitem", { name: "编辑" })`
`@e5 [tab] "评估器 0"`	`page.getByRole("tab", { name: /评估器/i })`

文件命名

功能测试：{功能名称}.spec.ts
访问控制：{角色}-access.spec.ts
速率限制：{功能}.rate-limit.spec.ts（最后运行）

常见陷阱

对话框不关闭：等待确定性后操作信号（例如，对话框隐藏 + 成功行可见）
多个元素：使用 .first()、.last() 或 .nth(n)
动态内容：在名称中使用正则表达式：{ name: /模式/i }
不稳定等待：优先使用 waitForURL 而非 waitForTimeout
菜单不出现：等待特定菜单状态/元素可见性

调试不稳定测试

关键经验教训

不要假设并行性是问题
- Phoenix 测试使用 7 个并行工作器运行无问题
- 应用处理并发登录、数据库操作和会话管理正确
- 如果测试因并行性失败，通常是测试时间问题，不是基础设施
- Playwright 的浏览器上下文隔离稳健 - 每个工作器获得隔离的 cookie/会话

waitForTimeout 几乎总是错误的

page.waitForTimeout() 是 Phoenix 测试中不稳定的主要原因
任意超时与渲染和网络速度竞争

始终替换为基于状态的等待：

// ❌ 不好 - 不稳定，与渲染竞争
await page.waitForTimeout(500);
await element.click();

// ✅ 好 - 等待实际状态
await element.waitFor({ state: "visible" });
await element.click();

在修复前测试实际失败
- 启用并行性运行测试以查看实际失败情况
- 检查错误消息 - 它们通常指向真正问题
- 如果不是问题，不要过早优化（例如，缓存认证状态）
Phoenix 测试基础设施稳固
- 内存 SQLite 与并行测试正常工作
- 无需每个工作器的数据库
- 无需认证状态缓存
- 测试使用 randomUUID() 进行数据隔离 - 这很有效

调试工作流

当测试不稳定时：

多次运行并行测试 以捕获间歇性失败：

for i in 1 2 3 4 5; do
  pnpm exec playwright test --project=chromium --reporter=dot
done

查找 waitForTimeout 使用 - 替换为适当等待：
```
grep -r "waitForTimeout" app/tests/
```
检查元素交互中的竞争条件：
- 在交互前等待元素可见性
- 需要时等待网络空闲：page.waitForLoadState("networkidle")
- 导航操作后使用 waitForURL
验证选择器稳定性：
- 避免依赖 DOM 结构的 CSS 选择器
- 使用匹配 ARIA 属性的角色/标签选择器
- 测试选择器在 UI 更新时不中断

失败时运行跟踪以查看发生了什么：

pnpm exec playwright test --trace on-first-retry

常见不稳定模式及修复

不稳定模式	根本原因	修复
子菜单项未找到	使用 `getByText()` 而不是 `getByRole()`	使用 `getByRole("menuitem", { name: /模式/i })` 用于子菜单项
菜单点击失败	菜单未完全渲染	点击前 `await menu.waitFor({ state: "visible" })`
对话框断言失败	对话框动画未完成	断言特定完成信号（隐藏对话框 + 下一个状态元素）
导航超时	页面仍在加载	移除 `waitForLoadState("networkidle")` - 在 CI 中不稳定
元素未找到	动态内容加载	等待元素可见性，不是任意超时
陈旧元素	在定位和点击之间重新渲染	存储定位器，不是元素句柄

测试稳定性最佳实践

使用适当等待：

// 等待元素状态
await element.waitFor({ state: "visible" | "hidden" | "attached" })

// 等待网络
await page.waitForLoadState("networkidle" | "domcontentloaded" | "load")

// 等待 URL 更改
await page.waitForURL("**/expected-path")

使用唯一测试数据：

const uniqueName = `test-${randomUUID()}`;

优先使用角色选择器 - 它们更不脆弱：

page.getByRole("button", { name: "保存" }) // ✅ 好
page.locator('button.save-btn') // ❌ 脆弱

不要对抗动画 - 等待它们：

await expect(dialog).not.toBeVisible();

验证 URL 更改 后导航：
```
await page.waitForURL("**/datasets");
```

名称: phoenix-playwright-tests 描述: 为 Phoenix AI 可观测性平台编写 Playwright 端到端测试。在创建、更新或调试 Playwright 测试时使用，或当用户询问测试 UI 功能、编写 E2E 测试或为 Phoenix 自动化浏览器交互时使用。 元数据: 内部: true