name: mlflow-experiment-tracker description: 用于实验跟踪、模型注册和工件管理的MLflow集成技能。使LLM能够通过MLflow API记录实验、比较运行、管理模型生命周期和检索工件。 allowed-tools: Read, Grep, Write, Bash, Edit, Glob, WebFetch

MLflow 实验跟踪器

与MLflow集成，提供全面的机器学习实验跟踪、模型注册操作和工件管理。

概述

此技能提供了与MLflow跟踪服务器和模型注册表交互的能力。它支持在机器学习工作流程中自动记录实验、比较运行、版本控制和工件检索。

功能

实验管理

创建和管理实验
以编程方式启动和结束运行
设置实验标签和描述
列出和搜索实验

参数和指标记录

记录超参数以确保可复现性
在训练期间跟踪指标（损失、准确率等）
使用时间戳记录批次指标
设置运行标签以便组织

工件管理

记录模型工件（序列化模型、检查点）
存储数据集和数据样本
保存图表和可视化结果
从已完成运行中检索工件

模型注册表操作

注册训练好的模型
管理模型版本
在阶段之间转换模型（暂存、生产、归档）
添加模型描述和标签

运行比较和分析

比较不同运行的指标
按参数/指标搜索运行
检索性能最佳的运行
生成比较可视化

前提条件

MLflow安装

pip install mlflow>=2.0.0

MLflow跟踪服务器

配置跟踪URI：

import mlflow
mlflow.set_tracking_uri("http://localhost:5000")  # 或远程服务器

可选：MLflow MCP服务器

为了增强LLM集成，安装MLflow MCP服务器：

pip install mlflow>=3.4  # 官方MCP支持
# 或
pip install mlflow-mcp   # 社区服务器

使用模式

启动实验运行

import mlflow

# 设置实验
mlflow.set_experiment("我的分类实验")

# 使用上下文管理器启动运行
with mlflow.start_run(run_name="基线模型"):
    # 记录参数
    mlflow.log_param("learning_rate", 0.01)
    mlflow.log_param("batch_size", 32)
    mlflow.log_param("epochs", 100)

    # 在训练期间记录指标
    for epoch in range(100):
        train_loss = train_one_epoch()
        mlflow.log_metric("train_loss", train_loss, step=epoch)

    # 记录最终指标
    mlflow.log_metric("accuracy", 0.95)
    mlflow.log_metric("f1_score", 0.93)

    # 记录模型工件
    mlflow.sklearn.log_model(model, "model")

搜索和比较运行

import mlflow

# 使用过滤器搜索运行
runs = mlflow.search_runs(
    experiment_names=["我的分类实验"],
    filter_string="metrics.accuracy > 0.9",
    order_by=["metrics.accuracy DESC"],
    max_results=10
)

# 获取最佳运行
best_run = runs.iloc[0]
print(f"最佳运行ID: {best_run.run_id}")
print(f"最佳准确率: {best_run['metrics.accuracy']}")

模型注册表操作

import mlflow

# 从运行注册模型
model_uri = f"runs:/{run_id}/model"
mlflow.register_model(model_uri, "生产分类器")

# 转换模型阶段
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
    name="生产分类器",
    version=1,
    stage="Production"
)

# 加载生产模型
model = mlflow.pyfunc.load_model("models:/生产分类器/Production")

与Babysitter SDK集成

任务定义示例

const mlflowTrackingTask = defineTask({
  name: 'mlflow-experiment-tracking',
  description: '使用MLflow跟踪ML实验',

  inputs: {
    experimentName: { type: 'string', required: true },
    runName: { type: 'string', required: true },
    parameters: { type: 'object', required: true },
    metrics: { type: 'object', required: true },
    modelPath: { type: 'string' }
  },

  outputs: {
    runId: { type: 'string' },
    experimentId: { type: 'string' },
    artifactUri: { type: 'string' }
  },

  async run(inputs, taskCtx) {
    return {
      kind: 'skill',
      title: `跟踪实验: ${inputs.experimentName}/${inputs.runName}`,
      skill: {
        name: 'mlflow-experiment-tracker',
        context: {
          operation: 'log_run',
          experimentName: inputs.experimentName,
          runName: inputs.runName,
          parameters: inputs.parameters,
          metrics: inputs.metrics,
          modelPath: inputs.modelPath
        }
      },
      io: {
        inputJsonPath: `tasks/${taskCtx.effectId}/input.json`,
        outputJsonPath: `tasks/${taskCtx.effectId}/result.json`
      }
    };
  }
});

MCP服务器集成

使用mlflow-mcp服务器

{
  "mcpServers": {
    "mlflow": {
      "command": "uvx",
      "args": ["mlflow-mcp"],
      "env": {
        "MLFLOW_TRACKING_URI": "http://localhost:5000"
      }
    }
  }
}

可用的MCP工具

mlflow_list_experiments - 列出所有实验
mlflow_search_runs - 使用过滤器搜索运行
mlflow_get_run - 获取运行详情
mlflow_log_metric - 记录指标
mlflow_log_param - 记录参数
mlflow_list_artifacts - 列出运行工件
mlflow_get_model_version - 获取模型版本详情

最佳实践

一致的命名：使用描述性的实验和运行名称
完整记录：记录所有超参数，不仅仅是调优的参数
指标粒度：以适当的间隔记录指标
工件组织：使用一致的工件路径
模型文档：为注册的模型添加描述
阶段管理：使用适当的暂存工作流程（无 -> 暂存 -> 生产）

MLflow实验跟踪器Skill mlflow-experiment-tracker

name: mlflow-experiment-tracker description: 用于实验跟踪、模型注册和工件管理的MLflow集成技能。使LLM能够通过MLflow API记录实验、比较运行、管理模型生命周期和检索工件。 allowed-tools: Read, Grep, Write, Bash, Edit, Glob, WebFetch

MLflow 实验跟踪器

概述

功能

实验管理

参数和指标记录

工件管理

模型注册表操作

运行比较和分析

前提条件

MLflow安装

MLflow跟踪服务器

可选：MLflow MCP服务器

使用模式

启动实验运行

搜索和比较运行

模型注册表操作

与Babysitter SDK集成

任务定义示例

MCP服务器集成

使用mlflow-mcp服务器

可用的MCP工具

最佳实践

参考资料