Name: pandas-DataFrame分析器Skill
Rating: 5 (24 reviews)
Author: a5c

name: pandas-dataframe-analyzer description: 用于统计摘要、缺失值检测、数据类型推断和内存优化建议的自动化DataFrame分析技能。 allowed-tools:

Read
Write
Bash
Glob
Grep

pandas-dataframe-analyzer

概述

使用pandas和数据分析库进行自动化DataFrame分析，包括统计摘要、缺失值模式检测、数据类型优化建议和内存占用分析。

功能

DataFrame的统计分析
缺失值模式检测
数据类型优化建议
内存占用分析
重复值检测与处理
分布分析与可视化
相关性矩阵计算
分类特征基数分析

目标流程

探索性数据分析（EDA）流程
数据收集与验证流程
特征工程设计与实施

工具与库

pandas
pandas-profiling / ydata-profiling
numpy
scipy（用于统计检验）

输入模式

{
  "type": "object",
  "required": ["dataPath"],
  "properties": {
    "dataPath": {
      "type": "string",
      "description": "数据文件路径（CSV、Parquet、JSON格式）"
    },
    "sampleSize": {
      "type": "integer",
      "description": "用于分析的采样行数",
      "default": 10000
    },
    "profileType": {
      "type": "string",
      "enum": ["minimal", "standard", "full"],
      "default": "standard"
    },
    "outputFormat": {
      "type": "string",
      "enum": ["json", "html", "markdown"],
      "default": "json"
    }
  }
}

输出模式

{
  "type": "object",
  "required": ["summary", "columns", "recommendations"],
  "properties": {
    "summary": {
      "type": "object",
      "properties": {
        "rowCount": { "type": "integer" },
        "columnCount": { "type": "integer" },
        "memoryUsageMB": { "type": "number" },
        "duplicateRows": { "type": "integer" },
        "missingCells": { "type": "integer" },
        "missingCellsPercent": { "type": "number" }
      }
    },
    "columns": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "dtype": { "type": "string" },
          "nullCount": { "type": "integer" },
          "uniqueCount": { "type": "integer" },
          "stats": { "type": "object" }
        }
      }
    },
    "recommendations": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "type": { "type": "string" },
          "column": { "type": "string" },
          "suggestion": { "type": "string" },
          "impact": { "type": "string" }
        }
      }
    }
  }
}

使用示例

{
  kind: 'skill',
  title: '分析训练数据集',
  skill: {
    name: 'pandas-dataframe-analyzer',
    context: {
      dataPath: 'data/train.csv',
      profileType: 'full',
      outputFormat: 'json'
    }
  }
}