通用数据集导入FiftyOne

导入任何数据集到FiftyOne，无论媒体类型、标签格式或文件夹结构如何。自动检测和处理：

所有媒体类型：图像、视频、点云、3D场景
所有标签格式：COCO、YOLO、VOC、CVAT、KITTI、OpenLABEL等
多模态组：每个场景多个相机+LiDAR（自动驾驶、机器人技术）
复杂文件夹结构：嵌套目录、基于场景的组织

使用此技能时：

从任何源或格式导入数据集
处理自动驾驶数据（多个相机、LiDAR、雷达）
加载需要分组的多模态数据
用户不知道或不指定确切格式
导入点云、3D场景或混合媒体类型

前提条件

安装并运行FiftyOne MCP服务器
@voxel51/io插件用于导入数据
@voxel51/utils插件用于数据集管理

关键指令

始终遵循这些规则：

1. 首先扫描文件夹

在任何导入之前，深入扫描目录以了解其结构：

# 使用bash进行探索
find /path/to/data -type f | head -50
ls -la /path/to/data

2. 自动检测一切

自动检测媒体类型、标签格式和分组模式。如果能够推断格式，永远不要让用户指定格式。

3. 检测多模态组

寻找表明分组数据的模式：

包含多个媒体文件的场景文件夹
具有共同前缀的文件名模式（例如，scene_001_left.jpg，scene_001_right.jpg）
应分组的混合媒体类型（图像+点云）

4. 检测和安装所需软件包

许多专门的数据集格式需要外部Python软件包。检测到格式后：

根据检测到的格式确定所需软件包
使用pip show <package>检查软件包是否已安装
如果需要，搜索安装说明（使用网络搜索或FiftyOne文档）
在安装任何软件包之前请求用户许可
安装所需软件包（见下文安装方法）
在继续之前验证安装

常见格式到软件包的映射：

数据集格式	软件包名称	安装命令
PandaSet	`pandaset`	`pip install "git+https://github.com/scaleapi/pandaset-devkit.git#subdirectory=python"`
nuScenes	`nuscenes-devkit`	`pip install nuscenes-devkit`
Waymo Open	`waymo-open-dataset-tf`	见Waymo文档（需要TensorFlow）
Argoverse 2	`av2`	`pip install av2`
KITTI 3D	`pykitti`	`pip install pykitti`
Lyft L5	`l5kit`	`pip install l5kit`
A2D2	`a2d2`	见Audi A2D2文档

3D处理的附加软件包：

用途	软件包名称	安装命令
点云转换为PCD	`open3d`	`pip install open3d`
点云处理	`pyntcloud`	`pip install pyntcloud`
LAS/LAZ点云	`laspy`	`pip install laspy`

安装方法（按偏好顺序）：

PyPI - 标准pip安装：
```
pip install <package-name>
```

GitHub URL - 当软件包不在PyPI上时：

# 标准GitHub安装
pip install "git+https://github.com/<org>/<repo>.git"

# 带子目录（对于monorepos）
pip install "git+https://github.com/<org>/<repo>.git#subdirectory=python"

# 特定分支或标签
pip install "git+https://github.com/<org>/<repo>.git@v1.0.0"

克隆并安装 - 对于复杂构建：

git clone https://github.com/<org>/<repo>.git
cd <repo>
pip install .

动态软件包发现工作流程：

如果格式不在上表中：

在PyPI上搜索 <format-name>，<format-name>-devkit或<format-name>-sdk
在GitHub上搜索 <format-name> devkit或<format-name> python
在网络上搜索 “FiftyOne import <format-name>“或”<format-name> python教程”
检查数据集的官方网站 以获取开发人员工具/SDK
向用户展示 安装选项

安装后：

验证软件包已安装：pip show <package-name>
在Python中测试导入：python -c "from <package> import ..."
搜索FiftyOne集成 示例或编写自定义导入代码

5. 导入前确认

向用户展示发现结果，并在创建数据集之前明确询问确认。总是在扫描摘要结束时提出一个清晰的问题：

“Proceed with import?”
“Should I create the dataset with these settings?”

等待用户响应后再继续。 在用户确认之前不要创建数据集。

6. 检查现有数据集

在创建数据集之前，检查提议的名称是否已存在：

list_datasets()

如果数据集名称已存在，询问用户：

覆盖：先删除现有数据集
重命名：使用不同的名称（建议替代方案如dataset-name-v2）
中止：取消导入

7. 导入后验证

比较导入的样本数量与源文件数量。报告任何差异。

8. 向用户报告最少的错误

对用户保持错误消息简单。在内部使用详细的错误信息来诊断问题。

完整工作流程

第1步：深入文件夹扫描

扫描目标目录以了解其结构：

# 按扩展名统计文件
find /path/to/data -type f | sed 's/.*\.//' | sort | uniq -c | sort -rn

# 列出目录结构（2级深度）
find /path/to/data -maxdepth 2 -type d

# 抽样一些文件
ls -la /path/to/data/* | head -20

# IMPORTANT: Scan for ALL annotation/label directories
ls -la /path/to/data/annotations/ 2>/dev/null || ls -la /path/to/data/labels/ 2>/dev/null

构建清单：

按类型（图像、视频、点云、3D）的媒体文件
按格式（JSON、XML、TXT、YAML、PKL）的标签文件
目录结构（平面与嵌套与基于场景）
ALL annotation types present（cuboids、segmentation、tracking等）

对于3D/Autonomous Driving数据集，特别检查：

# 列出所有注释子目录
find /path/to/data -type d -name "annotations" -o -name "labels" | xargs -I {} ls -la {}

# 示例注释文件以了解其结构
python3 -c "import pickle, gzip; print(pickle.load(gzip.open('path/to/annotation.pkl.gz', 'rb'))[:2])"

第2步：识别媒体类型

按扩展名分类文件：

扩展名	媒体类型	FiftyOne类型
`.jpg`、`.jpeg`、`.png`、`.gif`、`.bmp`、`.webp`、`.tiff`	图像	`image`
`.mp4`、`.avi`、`.mov`、`.mkv`、`.webm`	视频	`video`
`.pcd`、`.ply`、`.las`、`.laz`	点云	`point-cloud`
`.fo3d`、`.obj`、`.gltf`、`.glb`	3D场景	`3d`

第3步：检测标签格式

从文件模式中识别标签格式：

模式	格式	数据集类型
`annotations.json`或`instances*.json`具有COCO结构	COCO	`COCO`
`*.xml`文件具有Pascal VOC结构	VOC	`VOC`
`*.txt`每张图像+`classes.txt`	YOLOv4	`YOLOv4`
`data.yaml`+`labels/*.txt`	YOLOv5	`YOLOv5`
`*.txt`每张图像（KITTI格式）	KITTI	`KITTI`
单个`annotations.xml`（CVAT格式）	CVAT	`CVAT Image`
`*.json`具有OpenLABEL结构	OpenLABEL	`OpenLABEL Image`
类别文件夹结构	分类	`Image Classification Directory Tree`
`*.csv`带有文件路径列	CSV	`CSV`
`*.json`具有GeoJSON结构	GeoJSON	`GeoJSON`
`.dcm` DICOM文件	DICOM	`DICOM`
带有地理元数据的`.tiff`	GeoTIFF	`GeoTIFF`

专门的自动驾驶格式（需要外部软件包）：

目录模式	格式	所需软件包
`camera/`、`lidar/`、`annotations/cuboids/`带有`.pkl.gz`	PandaSet	`pandaset-devkit`
`samples/`、`sweeps/`、`v1.0-*`文件夹	nuScenes	`nuscenes-devkit`
`segment-*`带有`.tfrecord`文件	Waymo Open	`waymo-open-dataset-tf`
`argoverse-tracking/`结构	Argoverse	`argoverse-api`
`training/`、`testing/`带有`calib/`、`velodyne/`	KITTI 3D	`pykitti`
`scenes/`、`aerial_map/`	Lyft L5	`l5kit`

第4步：检测所需软件包

识别格式后，检查是否需要外部软件包：

# 检查软件包是否已安装（使用实际软件包名称，而不是仓库名称）
pip show pandaset

# 如果未找到，则需要安装软件包

如果需要软件包：

通知用户 需要哪些软件包及原因
如果未在常见映射表中，则搜索安装方法：
- 首先在PyPI上搜索：pip search <package>或检查pypi.org
- 在GitHub上搜索devkit/SDK仓库
- 检查数据集的官方文档
- 在网络上搜索：“<dataset-name> python安装”

请求许可 安装：

此数据集似乎是PandaSet格式，需要`pandaset`软件包。

软件包不在PyPI上，必须从GitHub安装：
pip install "git+https://github.com/scaleapi/pandaset-devkit.git#subdirectory=python"

你想让我：
- 安装软件包（推荐）
- 寻找替代导入方法
- 中止并让您手动安装

使用适当的方法 安装：

# PyPI（如果可用）
pip install <package-name>

# GitHub URL（如果不在PyPI上）
pip install "git+https://github.com/<org>/<repo>.git#subdirectory=python"

# 克隆并安装（对于复杂构建）
git clone https://github.com/<org>/<repo>.git && cd <repo> && pip install .

验证安装：
```
pip show <package-name>
```

在Python中测试导入：

python -c "from <package> import <main_class>; print('OK')"

搜索FiftyOne集成代码：
- 搜索：“FiftyOne <format-name>导入示例”
- 搜索：“<format-name>到FiftyOne分组数据集”
- 检查FiftyOne文档以获取类似数据集类型
- 如果没有示例，使用devkit API构建自定义导入代码

第5步：检测分组模式

确定数据是否应该分组：

模式A：场景文件夹（多模态最常见）

/data/
├── scene_001/
│   ├── left.jpg
│   ├── right.jpg
│   ├── lidar.pcd
│   └── labels.json
├── scene_002/
│   └── ...

检测：每个子文件夹=一个组，文件内部=切片

模式B：文件名前缀

/data/
├── 001_left.jpg
├── 001_right.jpg
├── 001_lidar.pcd
├── 002_left.jpg
├── 002_right.jpg
├── 002_lidar.pcd

检测：共同前缀=组ID，后缀=切片名称

模式C：无分组（平面）

/data/
├── image_001.jpg
├── image_002.jpg
├── image_003.jpg

检测：单媒体类型，无明显分组模式

第6步：向用户展示发现

在导入之前，向用户展示一个清晰的摘要，包括所有检测到的标签：

路径/to/data的扫描结果：

发现的媒体：
  - 3000图像（.jpg，.png）
  - 1000点云（.pkl.gz→将转换为.pcd）
  - 0视频

检测到的分组：
  - 模式：场景文件夹
  - 组：1000场景
  - 切片：左（图像），右（图像），前（图像），激光雷达（点云）

所有检测到的标签：
  ├── cuboids/           （3D边界框，1000文件）
  │   └── 格式：pickle，字段：标签、位置、尺寸、旋转、track_id
  ├── semseg/            （语义分割，1000文件）
  │   └── 格式：pickle，点级类标签
  └── instances.json     （2D检测，COCO格式）
      └── 类别：10（汽车、行人、骑自行车者，...）

所需软件包：
  - ✅ pandaset（已安装）
  - ⚠️ open3d（需要进行PCD转换）→ pip install open3d

提议的配置：
  - 数据集名称：my-dataset
  - 类型：分组（多模态）
  - 默认切片：front_camera
  - 要导入的标签：
    - detections_3d（来自cuboids/）
    - point_labels（来自semseg/）
    - detections（来自instances.json）

继续导入吗？（是/否）

重要事项：

列出扫描期间发现的所有注释类型
显示每种标签类型的格式/结构
指明哪些标签将被导入以及如何导入
等待用户确认后再继续

第7步：检查现有数据集

在创建之前，检查数据集名称是否已存在：

# 检查现有数据集
list_datasets()

如果提议的数据集名称已存在于列表中：

通知用户：“名为’my-dataset’的数据集已存在，包含X个样本。”
询问他们的偏好：
- 覆盖：先删除现有数据集
- 重命名：建议替代方案（例如，my-dataset-v2，my-dataset-20240107）
- 中止：取消导入

如果用户选择覆盖：

# 删除现有数据集
set_context(dataset_name="my-dataset")
execute_operator(
    operator_uri="@voxel51/utils/delete_dataset",
    params={"name": "my-dataset"}
)

第8步：创建数据集

# 创建数据集
execute_operator(
    operator_uri="@voxel51/utils/create_dataset",
    params={
        "name": "my-dataset",
        "persistent": true
    }
)

# 设置上下文
set_context(dataset_name="my-dataset")

第9A步：导入简单数据集（无组）

对于没有分组的平面数据集：

# 仅导入媒体
execute_operator(
    operator_uri="@voxel51/io/import_samples",
    params={
        "import_type": "MEDIA_ONLY",
        "style": "DIRECTORY",
        "directory": {"absolute_path": "/path/to/images"}
    }
)

# 导入标签
execute_operator(
    operator_uri="@voxel51/io/import_samples",
    params={
        "import_type": "MEDIA_AND_LABELS",
        "dataset_type": "COCO",
        "data_path": {"absolute_path": "/path/to/images"},
        "labels_path": {"absolute_path": "/path/to/annotations.json"},
        "label_field": "ground_truth"
    }
)

第9B步：导入分组数据集（多模态）

对于具有组的多模态数据，直接使用Python。指导用户：

import fiftyone as fo

# 创建数据集
dataset = fo.Dataset("multimodal-dataset", persistent=True)

# 添加组字段
dataset.add_group_field("group", default="front")

# 为每个组创建样本
import os
from pathlib import Path

data_dir = Path("/path/to/data")
samples = []

for scene_dir in sorted(data_dir.iterdir()):
    if not scene_dir.is_dir():
        continue

    # 为此场景创建一个组
    group = fo.Group()

    # 添加每个文件作为切片
    for file in scene_dir.iterdir():
        if file.suffix in ['.jpg', '.png']:
            # 从文件名确定切片名称
            slice_name = file.stem  # 例如，"left"、"right"、"front"
            samples.append(fo.Sample(
                filepath=str(file),
                group=group.element(slice_name)
            ))
        elif file.suffix == '.pcd':
            samples.append(fo.Sample(
                filepath=str(file),
                group=group.element("lidar")
            ))
        elif file.suffix == '.mp4':
            samples.append(fo.Sample(
                filepath=str(file),
                group=group.element("video")
            ))

# 添加所有样本
dataset.add_samples(samples)
print(f"Added {len(dataset)} samples in {len(dataset.distinct('group.id'))} groups")

第9C步：导入专门格式数据集（3D/自动驾驶）

对于需要外部软件包的数据集（PandaSet、nuScenes等），使用devkit加载数据并转换为FiftyOne格式。

通用方法：

搜索FiftyOne文档或网络上特定导入方法
使用devkit加载原始数据
将点云转换为PCD格式（FiftyOne需要.pcd文件）
为3D可视化创建fo.Scene对象，使用点云
将数据转换为具有适当分组的FiftyOne样本
导入扫描期间检测到的所有标签（cuboids、segmentation等）

将点云转换为PCD

许多自动驾驶数据集以专有格式存储激光雷达数据（.pkl.gz、.bin、.npy）。转换为PCD以供FiftyOne使用：

import numpy as np
import open3d as o3d
from pathlib import Path

def convert_to_pcd(points, output_path):
    """
    将点云数组转换为PCD文件。

    参数：
        points：numpy数组，形状为（N，3）或（N，4），具有XYZ或XYZI
        output_path：保存.pcd文件的路径
    """
    pcd = o3d.geometry.PointCloud()
    pcd.points = o3d.utility.Vector3dVector(points[:, :3])

    # 如果有强度，存储为颜色（灰度）
    if points.shape[1] >= 4：
        intensity = points[:, 3]
        intensity_normalized = (intensity - intensity.min()) / (intensity.max() - intensity.min() + 1e-8)
        colors = np.stack([intensity_normalized] * 3, axis=1)
        pcd.colors = o3d.utility.Vector3dVector(colors)

    o3d.io.write_point_cloud(str(output_path), pcd)
    return output_path

注意： 如有需要，请安装open3d：pip install open3d

为3D可视化创建fo.Scene

对于每个激光雷达帧，创建一个引用PCD文件的fo.Scene：

import fiftyone as fo

# 为点云创建3D场景
scene = fo.Scene()

# 将点云添加到场景
scene.add_point_cloud(
    name="lidar",
    pcd_path="/path/to/frame.pcd",
    flag_for_projection=True  # 启用投影到相机视图
)

# 创建带有场景的样本
sample = fo.Sample(filepath="/path/to/scene.fo3d")  # 或直接使用场景
sample["scene"] = scene

导入扫描期间检测到的所有标签

在文件夹扫描（第1步）期间，识别所有存在的标签类型：

# 示例：列出所有注释目录/文件
ls -la /path/to/dataset/annotations/
# 输出可能显示：cuboids/，semseg/，tracking/，instances.json等

将检测到的标签映射到FiftyOne标签类型：

注释类型	FiftyOne标签类型	字段名称
3D Cuboids/边界框	具有3D属性的`fo.Detection`	`detections_3d`
语义分割	`fo.Segmentation`	`segmentation`
实例分割	带掩码的`fo.Detections`	`instances`
跟踪ID	添加`track_id`到检测	`tracks`
分类	`fo.Classification`	`classification`
关键点/姿态	`fo.Keypoints`	`keypoints`

示例：PandaSet完整导入及标签

import fiftyone as fo
import numpy as np
import open3d as o3d
from pathlib import Path
import gzip
import pickle

data_path = Path("/path/to/pandaset")
pcd_output_dir = data_path / "pcd_converted"
pcd_output_dir.mkdir(exist_ok=True)

# 创建具有组的数据集
dataset = fo.Dataset("pandaset", persistent=True)
dataset.add_group_field("group", default="front_camera")

# 获取相机名称
camera_names = [d.name for d in (data_path / "camera").iterdir() if d.is_dir()]
frame_count = len(list((data_path / "camera" / "front_camera").glob("*.jpg")))

# 检查哪些标签存在
labels_dir = data_path / "annotations"
available_labels = [d.name for d in labels_dir.iterdir() if d.is_dir()]
print(f"Found label types: {available_labels}")  # 例如，['cuboids', 'semseg']

samples = []
for frame_idx in range(frame_count):
    frame_id = f"{frame_idx:02d}"
    group = fo.Group()

    # === 添加相机图像 ===
    for cam_name in camera_names：
        img_path = data_path / "camera" / cam_name / f"{frame_id}.jpg"
        if img_path.exists():
            sample = fo.Sample(filepath=str(img_path))
            sample["group"] = group.element(cam_name)
            sample["frame_idx"] = frame_idx
            samples.append(sample)

    # === 转换并添加激光雷达点云 ===
    lidar_pkl = data_path / "lidar" / f"{frame_id}.pkl.gz"
    if lidar_pkl.exists():
        # 加载pickle
        with gzip.open(lidar_pkl, 'rb') as f：
            lidar_data = pickle.load(f)

        # 提取点（根据实际数据结构调整）
        if isinstance(lidar_data, dict)：
            points = lidar_data.get('points', lidar_data.get('data'))
        else：
            points = np.array(lidar_data)

        # 转换为PCD
        pcd_path = pcd_output_dir / f"{frame_id}.pcd"
        pcd = o3d.geometry.PointCloud()
        pcd.points = o3d.utility.Vector3dVector(points[:, :3])
        o3d.io.write_point_cloud(str(pcd_path), pcd)

        # 创建3D样本与场景
        lidar_sample = fo.Sample(filepath=str(pcd_path))
        lidar_sample["group"] = group.element("lidar")
        lidar_sample["frame_idx"] = frame_idx

        # === 加载3D边界框标签（如果可用） ===
        # 重要事项：将3D属性存储为平面标量字段，而不是列表
        # 使用列表（例如，location=[x,y,z]）会在3D查看器中引起"Symbol.iterator"错误
        if "cuboids" in available_labels：
            cuboids_pkl = labels_dir / "cuboids" / f"{frame_id}.pkl.gz"
            if cuboids_pkl.exists():
                with gzip.open(cuboids_pkl, 'rb') as f：
                    cuboids_df = pickle.load(f)  # PandaSet使用pandas DataFrame

                detections = []
                for _, row in cuboids_df.iterrows():
                    detection = fo.Detection(
                        label=row.get("label", "object"),
                        bounding_box=[0, 0, 0.01, 0.01],  # 最小2D占位符
                    )
                    # 将3D属性存储为平面标量字段（而不是列表！）
                    detection["pos_x"] = float(row.get("position.x", 0))
                    detection["pos_y"] = float(row.get("position.y", 0))
                    detection["pos_z"] = float(row.get("position.z", 0))
                    detection["dim_x"] = float(row.get("dimensions.x", 1))
                    detection["dim_y"] = float(row.get("dimensions.y", 1))
                    detection["dim_z"] = float(row.get("dimensions.z", 1))
                    detection["yaw"] = float(row.get("yaw", 0))
                    detection["track_id"] = str(row.get("uuid", ""))
                    detection["stationary"] = bool(row.get("stationary", False))
                    detections.append(detection)

                lidar_sample["ground_truth"] = fo.Detections(detections=detections)

        # === 加载语义分割（如果可用） ===
        if "semseg" in available_labels：
            semseg_pkl = labels_dir / "semseg" / f"{frame_id}.pkl.gz"
            if semseg_pkl.exists():
                with gzip.open(semseg_pkl, 'rb') as f：
                    semseg_data = pickle.load(f)
                # 存储为自定义字段（点级标签）
                lidar_sample["point_labels"] = semseg_data.tolist() if hasattr(semseg_data, 'tolist') else semseg_data

        samples.append(lidar_sample)

# 添加所有样本
dataset.add_samples(samples)
dataset.save()

print(f"Imported {len(dataset)} groups with {len(dataset.select_group_slices())} total samples")
print(f"Slices: {dataset.group_slices}")
print(f"Labels imported: {available_labels}")

动态导入发现： 如果格式没有示例：

搜索：“FiftyOne <format-name>导入示例”
搜索：“<format-name> devkit python示例”
阅读devkit文档以了解数据结构

探索注释文件以了解标签格式：

import pickle, gzip
with gzip.open("annotations/cuboids/00.pkl.gz", "rb") as f：
    data = pickle.load(f)
print(type(data), data[0]如果 isinstance(data, list) else data)

根据devkit API和标签结构构建自定义导入代码

第10步：导入其他标签（可选）

如果标签没有随专门格式导入，单独添加它们：

# 对于引用文件路径的COCO标签
execute_operator(
    operator_uri="@voxel51/io/import_samples",
    params={
        "import_type": "LABELS_ONLY",
        "dataset_type": "COCO",
        "labels_path": {"absolute_path": "/path/to/annotations.json"},
        "label_field": "ground_truth"
    }
)

第11步：验证导入

# 加载并验证
load_dataset(name="my-dataset")

# 检查计数是否匹配
dataset_summary(name="my-dataset")

比较：

导入样本与源文件
创建的组与预期
导入的标签与注释计数

第12步：启动应用并查看

launch_app(dataset_name="my-dataset")

# 对于分组数据集，在应用中查看不同的切片
# 在应用中，使用切片选择器下拉菜单

支持的数据集类型

媒体类型

类型	扩展名	描述
`image`	`.jpg`、`.jpeg`、`.png`、`.gif`、`.bmp`、`.webp`、`.tiff`	静态图像
`video`	`.mp4`、`.avi`、`.mov`、`.mkv`、`.webm`	具有帧的视频文件
`point-cloud`	`.pcd`、`.ply`、`.las`、`.laz`	3D点云数据
`3d`	`.fo3d`、`.obj`、`.gltf`、`.glb`	3D场景和网格

标签格式

格式	数据集类型值	标签类型	文件模式
COCO	`COCO`	检测、分割、关键点	`*.json`
VOC/Pascal	`VOC`	检测	`*.xml`每张图像
KITTI	`KITTI`	检测	`*.txt`每张图像
YOLOv4	`YOLOv4`	检测	`*.txt`+`classes.txt`
YOLOv5	`YOLOv5`	检测	`data.yaml`+`labels/*.txt`
CVAT Image	`CVAT Image`	分类、检测、折线、关键点	单个`*.xml`
CVAT Video	`CVAT Video`	帧标签	XML目录
OpenLABEL Image	`OpenLABEL Image`	所有类型	`*.json`目录
OpenLABEL Video	`OpenLABEL Video`	所有类型	`*.json`目录
TF Object Detection	`TF Object Detection`	检测	TFRecords
TF Image Classification	`TF Image Classification`	分类	TFRecords
Image Classification Tree	`Image Classification Directory Tree`	分类	每个类别的文件夹
Video Classification Tree	`Video Classification Directory Tree`	分类	每个类别的文件夹
Image Segmentation	`Image Segmentation`	分割	掩码图像
CSV	`CSV`	自定义字段	`*.csv`
DICOM	`DICOM`	医疗元数据	`.dcm`文件
GeoJSON	`GeoJSON`	地理位置	`*.json`
GeoTIFF	`GeoTIFF`	地理位置	带有地理的`.tiff`
FiftyOne Dataset	`FiftyOne Dataset`	所有类型	导出格式

常见用例

用例1：简单的图像数据集与COCO标签

# 扫描目录
# 发现：5000图像，annotations.json（COCO格式）

execute_operator(
    operator_uri="@voxel51/utils/create_dataset",
    params={"name": "coco-dataset", "persistent": true}
)

set_context(dataset_name="coco-dataset")

execute_operator(
    operator_uri="@voxel51/io/import_samples",
    params={
        "import_type": "MEDIA_AND_LABELS",
        "dataset_type": "COCO",
        "data_path": {"absolute_path": "/path/to/images"},
        "labels_path": {"absolute_path": "/path/to/annotations.json"},
        "label_field": "ground_truth"
    }
)

launch_app(dataset_name="coco-dataset")

用例2：YOLO数据集

# 扫描目录
# 发现：data.yaml，images/，labels/（YOLOv5格式）

execute_operator(
    operator_uri="@voxel51/utils/create_dataset",
    params={"name": "yolo-dataset", "persistent": true}
)

set_context(dataset_name="yolo-dataset")

execute_operator(
    operator_uri="@voxel51/io/import_samples",
    params={
        "import_type": "MEDIA_AND_LABELS",
        "dataset_type": "YOLOv5",
        "dataset_dir": {"absolute_path": "/path/to/yolo/dataset"},
        "label_field": "ground_truth"
    }
)

launch_app(dataset_name="yolo-dataset")

用例3：点云数据集

# 扫描目录
# 发现：1000.pcd文件，labels/带有KITTI格式

execute_operator(
    operator_uri="@voxel51/utils/create_dataset",
    params={"name": "lidar-dataset", "persistent": true}
)

set_context(dataset_name="lidar-dataset")

# 导入点云
execute_operator(
    operator_uri="@voxel51/io/import_samples",
    params={
        "import_type": "MEDIA_ONLY",
        "style": "GLOB_PATTERN",
        "glob_patt": {"absolute_path": "/path/to/data/*.pcd"}
    }
)

launch_app(dataset_name="lidar-dataset")

用例4：自动驾驶（多模态组）

这是最复杂的情况 - 每个场景多个相机+激光雷达：

import fiftyone as fo
from pathlib import Path

# 创建支持组的数据集
dataset = fo.Dataset("driving-dataset", persistent=True)
dataset.add_group_field("group", default="front_camera")

data_dir = Path("/path/to/driving_data")
samples = []

# 处理每个场景文件夹
for scene_dir in sorted(data_dir.iterdir()):
    if not scene_dir.is_dir():
        continue

    group = fo.Group()

    # 将文件映射到切片
    slice_mapping = {
        "front": "front_camera",
        "left": "left_camera",
        "right": "right_camera",
        "rear": "rear_camera",
        "lidar": "lidar",
        "radar": "radar"
    }

    for file in scene_dir.iterdir():
        # 从文件名确定切片
        for key, slice_name in slice_mapping.items():
            if key in file.stem.lower():
                samples.append(fo.Sample(
                    filepath=str(file),
                    group=group.element(slice_name)
                ))
                break

dataset.add_samples(samples)
dataset.save()

print(f"Created {len(dataset.distinct('group.id'))} groups")
print(f"Slices: {dataset.group_slices}")
print(f"Media types: {dataset.group_media_types}")

# 启动应用
session = fo.launch_app(dataset)

用例5：分类目录树

# 扫描目录
# 发现：cats/，dogs/，birds/文件夹内有图像

execute_operator(
    operator_uri="@voxel51/utils/create_dataset",
    params={"name": "classification-dataset", "persistent": true}
)

set_context(dataset_name="classification-dataset")

execute_operator(
    operator_uri="@voxel51/io/import_samples",
    params={
        "import_type": "MEDIA_AND_LABELS",
        "dataset_type": "Image Classification Directory Tree",
        "dataset_dir": {"absolute_path": "/path/to/classification"},
        "label_field": "ground_truth"
    }
)

launch_app(dataset_name="classification-dataset")

用例6：混合媒体（图像+视频）

# 扫描目录
# 发现：images/，videos/文件夹

# 创建数据集
execute_operator(
    operator_uri="@voxel51/utils/create_dataset",
    params={"name": "mixed-media", "persistent": true}
)

set_context(dataset_name="mixed-media")

# 导入图像
execute_operator(
    operator_uri="@voxel51/io/import_samples",
    params={
        "import_type": "MEDIA_ONLY",
        "style": "DIRECTORY",
        "directory": {"absolute_path": "/path/to/images"},
        "tags": ["image"]
    }
)

# 导入视频
execute_operator(
    operator_uri="@voxel51/io/import_samples",
    params={
        "import_type": "MEDIA_ONLY",
        "style": "DIRECTORY",
        "directory": {"absolute_path": "/path/to/videos"},
        "tags": ["video"]
    }
)

launch_app(dataset_name="mixed-media")

处理组

理解组结构

在分组数据集中：

每个组代表一个场景/时刻（例如，一个时间戳）
每个切片代表一种模态（例如，左相机，激光雷达）
所有组中的样本共享相同的group.id
每个样本都有一个group.name，指示其切片

# 访问组信息
print(dataset.group_slices)        # ['front_camera', 'left_camera', 'lidar']
print(dataset.group_media_types)   # {'front_camera': 'image', 'lidar': 'point-cloud'}
print(dataset.default_group_slice) # 'front_camera'

# 遍历组
for group in dataset.iter_groups():
    print(f"Group has {len(group)} slices")
    for slice_name, sample in group.items():
        print(f"  {slice_name}: {sample.filepath}")

# 获取特定切片视图
front_images = dataset.select_group_slices("front_camera")
all_point_clouds = dataset.select_group_slices(media_type="point-cloud")

在应用中查看组

启动应用后：

切片选择器下拉菜单出现在顶部栏
选择不同的切片以查看每种模态
样本是同步的 - 选择一个样本显示其所有组成员
使用网格视图并排查看多个切片

故障排除

错误：“数据集已存在”

使用不同的数据集名称
或删除现有：execute_operator("@voxel51/utils/delete_dataset", {"name": "dataset-name"})

错误：“未发现样本”

验证目录路径正确且可访问
检查文件扩展名是否受支持
对于嵌套目录，请确保递归扫描

错误：“标签路径未找到”

验证标签文件/目录存在
检查路径是否绝对，而不是相对
确保检测到正确的格式

错误：“无效的组配置”

每个组必须至少有一个样本
切片名称必须在组之间一致
每个组只允许一个3d切片

导入速度慢

对于大型数据集，请使用委托执行
如有需要，可以分批导入
考虑使用glob模式过滤文件

点云未渲染

确保.pcd文件有效
检查FiftyOne 3D可视化是否启用
验证点云插件是否已安装

未检测到组

检查文件夹结构是否符合预期模式
验证场景之间的命名是否一致
可能需要手动指定分组

最佳实践

始终先扫描 - 在导入之前了解数据
与用户确认 - 在创建数据集之前展示发现结果
使用描述性名称 - 数据集名称和标签字段应该是有意义的
验证计数 - 确保导入的样本与源文件匹配
优雅地处理错误 - 清晰报告问题，继续处理有效文件
对多模态使用组 - 不要将应该分组的数据展平
设置适当的默认切片 - 选择最常查看的模态
标记导入 - 使用标签跟踪导入批次或来源

性能说明

导入时间估计：

1,000图像：~10-30秒
10,000图像：~2-5分钟
100,000图像：~20-60分钟
点云：~比图像慢2倍
视频：取决于帧提取设置

内存需求：

每样本元数据约1KB
媒体文件被引用，而不是加载到内存中
大型数据集可能需要增加MongoDB限制