YOLO集成 YOLOIntegration

本技能提供使用Ultralytics库集成YOLO(You Only Look Once)对象检测模型的全面指南,涵盖YOLOv8和YOLOv9模型的加载、推理、自定义训练、对象检测、实例分割、姿态估计、实时推理、批处理、API集成、性能优化和生产部署。关键词:YOLO, 对象检测, Ultralytics, 计算机视觉, 深度学习, AI集成, 图像处理, 实时分析, 部署优化

计算机视觉 0 次安装 0 次浏览 更新于 3/5/2026

name: YOLO集成 description: 使用Ultralytics库集成YOLO(You Only Look Once)对象检测模型(YOLOv8,YOLOv9)的全面指南。

YOLO集成

概述

YOLO(You Only Look Once)是一种以其速度和准确性著称的最先进的对象检测模型。本技能涵盖使用Ultralytics库进行YOLO集成,包括YOLOv8和YOLOv9模型、模型加载和推理、自定义训练、对象检测、实例分割、姿态估计、实时推理、批处理、API集成、性能优化和生产部署。

先决条件

  • 理解计算机视觉和对象检测概念
  • 了解PyTorch和深度学习
  • 熟悉OpenCV和图像处理
  • 理解数据集格式(COCO,YOLO)
  • 基本了解FastAPI和REST API

关键概念

YOLO模型

  • YOLOv8:最先进的对象检测,带有无锚检测
  • YOLOv9:最新的YOLO版本,具有改进的准确性
  • 模型大小:n(纳米),s(小),m(中),l(大),x(超大)
  • 任务变体:检测,分割,姿态估计,分类

Ultralytics库

  • YOLO类:模型加载和推理的主要接口
  • 训练流程:内置训练与数据增强
  • 导出格式:ONNX,TensorRT,CoreML,TFLite
  • 结果对象:结构化检测结果,包含框、掩码、关键点

数据集格式

  • YOLO格式:归一化边界框与类别ID
  • COCO格式:行业标准注释格式
  • data.yaml:训练的数据集配置
  • 目录结构:组织图像和标签文件夹

部署模式

  • FastAPI服务器:用于YOLO推理的REST API
  • Docker部署:容器化YOLO服务
  • Kubernetes:可扩展部署,支持GPU
  • 实时推理:网络摄像头和视频流处理

实施指南

YOLO设置(Ultralytics)

安装

# 基本安装
pip install ultralytics

# 安装GPU支持
pip install ultralytics[torch]

# 安装所有依赖
pip install ultralytics[all]

# 从源代码安装
git clone https://github.com/ultralytics/ultralytics
cd ultralytics
pip install -e .

验证安装

from ultralytics import YOLO
import torch

# 检查版本
from ultralytics import __version__
print(f"Ultralytics版本: {__version__}")

# 检查CUDA可用性
print(f"CUDA可用: {torch.cuda.is_available()}")
print(f"CUDA设备数量: {torch.cuda.device_count()}")

# 测试模型加载
model = YOLO("yolov8n.pt")  # 加载预训练模型
print(f"模型加载成功: {model.names}")

可用模型

from ultralytics import YOLO

# YOLOv8模型(n,s,m,l,x)
yolov8_models = {
    "nano": "yolov8n.pt",      # 最快,最低准确性
    "small": "yolov8s.pt",     # 快,良好准确性
    "medium": "yolov8m.pt",    # 平衡
    "large": "yolov8l.pt",     # 较慢,更高准确性
    "xlarge": "yolov8x.pt",    # 最慢,最高准确性
}

# YOLOv9模型
yolov9_models = {
    "nano": "yolov9c.pt",
    "small": "yolov9s.pt",
    "medium": "yolov9m.pt",
    "large": "yolov9e.pt",
}

# 分割模型
seg_models = {
    "yolov8n-seg.pt",
    "yolov8s-seg.pt",
}

# 姿态估计模型
pose_models = {
    "yolov8n-pose.pt",
    "yolov8s-pose.pt",
}

模型加载和推理

基本推理

from ultralytics import YOLO

# 加载模型
model = YOLO("yolov8n.pt")

# 图像推理
results = model("path/to/image.jpg")

# 访问结果
for result in results:
    boxes = result.boxes  # 边界框
    masks = result.masks  # 分割掩码
    keypoints = result.keypoints  # 姿态关键点
    probs = result.probs  # 分类概率

    # 获取检测
    for box in boxes:
        class_id = int(box.cls[0])
        class_name = model.names[class_id]
        confidence = float(box.conf[0])
        bbox = box.xyxy[0].tolist()  # [x1, y1, x2, y2]

        print(f"{class_name}: {confidence:.2f} 在 {bbox}")

带参数推理

# 自定义参数推理
results = model(
    "path/to/image.jpg",
    conf=0.25,        # 置信度阈值
    iou=0.45,         # NMS IoU阈值
    max_det=100,      # 最大检测数
    device="0",       # GPU设备
    half=True,        # 半精度(FP16)
    verbose=False,    # 抑制输出
    save=True,        # 保存结果
    show=True,        # 显示结果
    stream=True,      # 流式结果
)

# 多图像推理
results = model(["image1.jpg", "image2.jpg", "image3.jpg"])

视频推理

# 视频文件推理
results = model("video.mp4", save=True)

# 网络摄像头推理
results = model(source=0, show=True)

# RTSP流推理
results = model("rtsp://username:password@ip:port/stream", stream=True)

# 逐帧处理视频
for result in model("video.mp4", stream=True):
    # 处理每一帧
    boxes = result.boxes
    # 您的处理逻辑在这里

自定义训练

数据集准备

数据集结构:

dataset/
├── data.yaml
├── train/
│   ├── images/
│   │   ├── image1.jpg
│   │   ├── image2.jpg
│   │   └── ...
│   └── labels/
│       ├── image1.txt
│       ├── image2.txt
│       └── ...
├── val/
│   ├── images/
│   └── labels/
└── test/
    ├── images/
    └── labels/

data.yaml配置:

# 数据集配置
path: /path/to/dataset  # 数据集根目录
train: train/images      # 训练图像相对于'path'
val: val/images          # 验证图像相对于'path'
test: test/images        # 测试图像(可选)

# 类别
names:
  0: person
  1: car
  2: dog
  3: cat

# 类别数量
nc: 4

标签格式(YOLO格式):

# 标签文件每行: class_id center_x center_y width height
# 值归一化[0, 1]
0 0.5 0.5 0.3 0.4    # 类别0在中心(0.5, 0.5) 大小(0.3, 0.4)
1 0.2 0.3 0.1 0.2    # 类别1在(0.2, 0.3) 大小(0.1, 0.2)

数据集转换脚本:

import json
import os
from pathlib import Path

def coco_to_yolo(coco_json_path, output_dir, image_width, image_height):
    """将COCO格式转换为YOLO格式。"""
    with open(coco_json_path, 'r') as f:
        coco_data = json.load(f)

    # 创建从image_id到文件名的映射
    image_id_to_name = {img['id']: img['file_name'] for img in coco_data['images']}

    # 创建输出目录
    os.makedirs(f"{output_dir}/labels", exist_ok=True)

    # 处理注释
    for annotation in coco_data['annotations']:
        image_id = annotation['image_id']
        image_name = image_id_to_name[image_id]
        label_name = os.path.splitext(image_name)[0] + '.txt'
        label_path = os.path.join(output_dir, 'labels', label_name)

        # 将bbox转换为YOLO格式
        x, y, width, height = annotation['bbox']
        x_center = (x + width / 2) / image_width
        y_center = (y + height / 2) / image_height
        width_norm = width / image_width
        height_norm = height / image_height

        # 写入标签文件
        with open(label_path, 'a') as f:
            f.write(f"{annotation['category_id']} {x_center} {y_center} {width_norm} {height_norm}
")

# 使用
coco_to_yolo(
    coco_json_path="annotations.json",
    output_dir="dataset",
    image_width=640,
    image_height=640
)

训练配置

from ultralytics import YOLO

# 加载模型
model = YOLO("yolov8n.pt")  # 加载预训练模型

# 训练模型
results = model.train(
    data="data.yaml",           # 数据集配置
    epochs=100,                 # 轮次数
    batch=16,                   # 批大小
    imgsz=640,                  # 图像大小
    device="0",                 # GPU设备
    workers=8,                  # 工作线程数
    patience=50,                # 早停耐心
    save=True,                  # 保存检查点
    save_period=10,             # 每N轮保存
    cache=True,                 # 缓存图像
    project="runs/train",       # 项目目录
    name="experiment",          # 实验名称
    exist_ok=False,             # 覆盖现有实验
    pretrained=True,            # 使用预训练权重
    optimizer="SGD",            # 优化器(SGD,Adam,AdamW)
    lr0=0.01,                   # 初始学习率
    lrf=0.01,                   # 最终学习率分数
    momentum=0.937,             # SGD动量
    weight_decay=0.0005,        # 权重衰减
    warmup_epochs=3,            # 预热轮次
    warmup_momentum=0.8,        # 预热动量
    warmup_bias_lr=0.1,         # 预热偏置学习率
    box=7.5,                    # 框损失增益
    cls=0.5,                    # 分类损失增益
    dfl=1.5,                    # DFL损失增益
    mosaic=1.0,                 # 马赛克增强概率
    mixup=0.0,                  # 混合增强概率
    copy_paste=0.0,             # 复制粘贴增强概率
    auto_augment="randaugment", # 自动增强
    erasing=0.4,                # 随机擦除概率
    crop_fraction=1.0,          # 图像裁剪分数
    hsv_h=0.015,                # HSV-色调增强
    hsv_s=0.7,                  # HSV-饱和度增强
    hsv_v=0.4,                  # HSV-值增强
    degrees=0.0,                # 旋转度数
    translate=0.1,              # 平移
    scale=0.5,                  # 缩放
    shear=0.0,                  # 剪切
    perspective=0.0,             # 透视
    flipud=0.0,                 # 垂直翻转概率
    fliplr=0.5,                 # 水平翻转概率
    bgr=0.0,                    # BGR翻转概率
    mosaic_prob=1.0,            # 马赛克概率
    mixup_prob=0.0,             # 混合概率
    copy_paste_prob=0.0,        # 复制粘贴概率
)

微调

from ultralytics import YOLO

# 加载预训练模型
model = YOLO("yolov8n.pt")

# 冻结骨干层(可选)
for i, (name, param) in enumerate(model.named_parameters()):
    if "backbone" in name:
        param.requires_grad = False

# 以较低学习率微调
results = model.train(
    data="data.yaml",
    epochs=50,
    batch=16,
    imgsz=640,
    lr0=0.001,  # 微调较低学习率
    optimizer="Adam",
    project="runs/train",
    name="fine_tune",
    pretrained=True,
)

# 解冻并继续训练(可选)
for param in model.parameters():
    param.requires_grad = True

results = model.train(
    data="data.yaml",
    epochs=50,
    batch=16,
    imgsz=640,
    lr0=0.0001,  # 更低学习率
    optimizer="Adam",
    project="runs/train",
    name="fine_tune_unfrozen",
    resume=True,  # 从最后检查点恢复
)

恢复训练

# 从最后检查点恢复
model = YOLO("runs/train/experiment/weights/last.pt")

results = model.train(
    data="data.yaml",
    epochs=200,  # 总轮次(将从停止处继续)
    resume=True,
)

对象检测

基本检测

from ultralytics import YOLO
import cv2

model = YOLO("yolov8n.pt")

# 检测对象
results = model("image.jpg")

# 处理结果
for result in results:
    # 获取带检测的绘制图像
    annotated_image = result.plot()

    # 保存注释图像
    cv2.imwrite("result.jpg", annotated_image)

    # 将检测获取为pandas DataFrame
    df = result.to_df()
    print(df)

    # 将检测获取为JSON
    detections = result.tojson()
    print(detections)

自定义检测流程

from ultralytics import YOLO
import cv2
import numpy as np

class YOLODetector:
    def __init__(self, model_path="yolov8n.pt", conf_threshold=0.25):
        self.model = YOLO(model_path)
        self.conf_threshold = conf_threshold

    def detect(self, image, filter_classes=None):
        """
        检测图像中的对象。

        参数:
            image: 输入图像(numpy数组或路径)
            filter_classes: 要过滤的类别名称列表(None为所有)

        返回:
            检测列表,包含类别、置信度、bbox
        """
        results = self.model(image, conf=self.conf_threshold, verbose=False)

        detections = []
        for result in results:
            for box in result.boxes:
                class_id = int(box.cls[0])
                class_name = self.model.names[class_id]
                confidence = float(box.conf[0])
                bbox = box.xyxy[0].cpu().numpy().astype(int)

                # 如果指定,按类别过滤
                if filter_classes is None or class_name in filter_classes:
                    detections.append({
                        "class": class_name,
                        "class_id": class_id,
                        "confidence": confidence,
                        "bbox": bbox.tolist()  # [x1, y1, x2, y2]
                    })

        return detections

    def draw_detections(self, image, detections):
        """在图像上绘制检测。"""
        image = image.copy()

        for det in detections:
            x1, y1, x2, y2 = det["bbox"]
            class_name = det["class"]
            confidence = det["confidence"]

            # 绘制边界框
            cv2.rectangle(image, (x1, y1), (x2, y2), (0, 255, 0), 2)

            # 绘制标签
            label = f"{class_name}: {confidence:.2f}"
            label_size, _ = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 2)
            cv2.rectangle(image, (x1, y1 - label_size[1] - 10),
                         (x1 + label_size[0], y1), (0, 255, 0), -1)
            cv2.putText(image, label, (x1, y1 - 5),
                       cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 2)

        return image

# 使用
detector = YOLODetector("yolov8n.pt", conf_threshold=0.5)

# 检测
image = cv2.imread("image.jpg")
detections = detector.detect(image, filter_classes=["person", "car"])

# 绘制
annotated = detector.draw_detections(image, detections)
cv2.imwrite("detections.jpg", annotated)

批处理

from ultralytics import YOLO
import os
from pathlib import Path

def batch_detect(input_dir, output_dir, model_path="yolov8n.pt"):
    """处理目录中的所有图像。"""
    model = YOLO(model_path)

    # 创建输出目录
    os.makedirs(output_dir, exist_ok=True)

    # 处理图像
    for image_path in Path(input_dir).glob("*.jpg"):
        print(f"处理 {image_path.name}...")

        # 检测
        results = model(str(image_path), save=False)

        # 保存注释图像
        for result in results:
            output_path = os.path.join(output_dir, f"detected_{image_path.name}")
            result.save(output_path)

        # 将检测保存为JSON
        for result in results:
            json_path = os.path.join(output_dir, f"{image_path.stem}.json")
            with open(json_path, "w") as f:
                f.write(result.tojson())

# 使用
batch_detect(
    input_dir="images/",
    output_dir="output/",
    model_path="yolov8n.pt"
)

实例分割

分割推理

from ultralytics import YOLO
import cv2
import numpy as np

# 加载分割模型
model = YOLO("yolov8n-seg.pt")

# 运行分割
results = model("image.jpg")

# 处理分割结果
for result in results:
    # 获取掩码
    masks = result.masks

    if masks:
        # 获取掩码数据
        mask_data = masks.data.cpu().numpy()  # (N, H, W)

        # 获取分割多边形
        for i, mask in enumerate(masks.xy):
            class_id = int(result.boxes.cls[i])
            class_name = model.names[class_id]
            confidence = float(result.boxes.conf[i])

            print(f"{class_name}: {confidence:.2f}")
            print(f"掩码形状: {mask.shape}")

        # 绘制分割掩码
        annotated = result.plot()
        cv2.imwrite("segmentation.jpg", annotated)

掩码处理

import numpy as np
import cv2

def extract_object(image, mask):
    """使用掩码从图像中提取对象。"""
    # 将掩码转换为uint8
    mask_uint8 = (mask * 255).astype(np.uint8)

    # 创建3通道掩码
    mask_3ch = cv2.cvtColor(mask_uint8, cv2.COLOR_GRAY2BGR)

    # 将掩码应用于图像
    result = cv2.bitwise_and(image, image, mask=mask_uint8)

    # 创建透明背景
    result_rgba = cv2.cvtColor(result, cv2.COLOR_BGR2BGRA)
    result_rgba[:, :, 3] = mask_uint8

    return result_rgba

# 使用
for result in results:
    if result.masks:
        for i, mask in enumerate(result.masks.data):
            mask_np = mask.cpu().numpy()
            extracted = extract_object(image, mask_np)
            cv2.imwrite(f"object_{i}.png", extracted)

姿态估计

姿态检测

from ultralytics import YOLO
import cv2

# 加载姿态模型
model = YOLO("yolov8n-pose.pt")

# 检测姿态
results = model("image.jpg")

# 处理姿态结果
for result in results:
    keypoints = result.keypoints

    if keypoints:
        # 关键点形状: (N, 17, 3) - N人, 17关键点, (x, y, 置信度)
        for i, kpts in enumerate(keypoints.xy):
            print(f"人物 {i}:")
            for j, (x, y) in enumerate(kpts):
                conf = keypoints.conf[i][j]
                print(f"  关键点 {j}: ({x:.1f}, {y:.1f}) 置信度={conf:.2f}")

        # 绘制姿态
        annotated = result.plot()
        cv2.imwrite("pose.jpg", annotated)

姿态分析

import numpy as np

class PoseAnalyzer:
    def __init__(self):
        # COCO 17关键点
        self.keypoint_names = [
            "鼻子", "左眼", "右眼", "左耳", "右耳",
            "左肩", "右肩", "左肘", "右肘",
            "左腕", "右腕", "左髋", "右髋",
            "左膝", "右膝", "左踝", "右踝"
        ]

    def calculate_angle(self, a, b, c):
        """计算三点之间的角度。"""
        a = np.array(a)
        b = np.array(b)
        c = np.array(c)

        radians = np.arctan2(c[1] - b[1], c[0] - b[0]) - \
                  np.arctan2(a[1] - b[1], a[0] - b[0])
        angle = np.abs(radians * 180.0 / np.pi)

        if angle > 180.0:
            angle = 360 - angle

        return angle

    def analyze_pose(self, keypoints):
        """分析姿态并返回关节角度。"""
        # 关键点索引
        left_shoulder = 5
        left_elbow = 7
        left_wrist = 9
        right_shoulder = 6
        right_elbow = 8
        right_wrist = 10
        left_hip = 11
        right_hip = 12
        left_knee = 13
        right_knee = 14

        angles = {}

        # 左臂角度
        angles["left_arm"] = self.calculate_angle(
            keypoints[left_shoulder],
            keypoints[left_elbow],
            keypoints[left_wrist]
        )

        # 右臂角度
        angles["right_arm"] = self.calculate_angle(
            keypoints[right_shoulder],
            keypoints[right_elbow],
            keypoints[right_wrist]
        )

        # 左腿角度
        angles["left_leg"] = self.calculate_angle(
            keypoints[left_hip],
            keypoints[left_knee],
            keypoints[15]  # left_ankle
        )

        # 右腿角度
        angles["right_leg"] = self.calculate_angle(
            keypoints[right_hip],
            keypoints[right_knee],
            keypoints[16]  # right_ankle
        )

        return angles

# 使用
model = YOLO("yolov8n-pose.pt")
analyzer = PoseAnalyzer()

results = model("image.jpg")
for result in results:
    if result.keypoints:
        for i, kpts in enumerate(result.keypoints.xy):
            angles = analyzer.analyze_pose(kpts.cpu().numpy())
            print(f"人物 {i} 角度: {angles}")

实时推理

网络摄像头推理

from ultralytics import YOLO
import cv2

model = YOLO("yolov8n.pt")

# 打开网络摄像头
cap = cv2.VideoCapture(0)

while cap.isOpened():
    # 读取帧
    success, frame = cap.read()
    if not success:
        break

    # 运行推理
    results = model(frame, verbose=False)

    # 绘制结果
    annotated_frame = results[0].plot()

    # 显示
    cv2.imshow("YOLO推理", annotated_frame)

    # 按'q'键中断
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

带FPS计数器的实时推理

from ultralytics import YOLO
import cv2
import time

model = YOLO("yolov8n.pt")
cap = cv2.VideoCapture(0)

fps_counter = 0
fps_start_time = time.time()

while cap.isOpened():
    success, frame = cap.read()
    if not success:
        break

    # 运行推理
    results = model(frame, verbose=False)

    # 绘制结果
    annotated_frame = results[0].plot()

    # 计算FPS
    fps_counter += 1
    if time.time() - fps_start_time >= 1.0:
        fps = fps_counter
        fps_counter = 0
        fps_start_time = time.time()
        cv2.putText(annotated_frame, f"FPS: {fps}", (10, 30),
                   cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)

    cv2.imshow("YOLO推理", annotated_frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

多线程推理

from ultralytics import YOLO
import cv2
import threading
import queue

class YOLOInferenceThread:
    def __init__(self, model_path="yolov8n.pt"):
        self.model = YOLO(model_path)
        self.input_queue = queue.Queue(maxsize=1)
        self.output_queue = queue.Queue(maxsize=1)
        self.running = False

    def start(self):
        self.running = True
        self.thread = threading.Thread(target=self._run_inference)
        self.thread.start()

    def stop(self):
        self.running = False
        self.thread.join()

    def _run_inference(self):
        while self.running:
            try:
                frame = self.input_queue.get(timeout=0.1)
                results = self.model(frame, verbose=False)
                self.output_queue.put(results)
            except queue.Empty:
                continue

    def predict(self, frame):
        self.input_queue.put(frame)
        try:
            return self.output_queue.get(timeout=1.0)
        except queue.Empty:
            return None

# 使用
inference_thread = YOLOInferenceThread("yolov8n.pt")
inference_thread.start()

cap = cv2.VideoCapture(0)

while cap.isOpened():
    success, frame = cap.read()
    if not success:
        break

    # 非阻塞推理
    results = inference_thread.predict(frame)

    if results:
        annotated_frame = results[0].plot()
        cv2.imshow("YOLO推理", annotated_frame)

    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

inference_thread.stop()
cap.release()
cv2.destroyAllWindows()

批处理

高效批处理推理

from ultralytics import YOLO
import torch
from pathlib import Path
import cv2

def batch_inference(image_paths, model_path="yolov8n.pt", batch_size=32):
    """批量处理图像。"""
    model = YOLO(model_path)

    results = []
    for i in range(0, len(image_paths), batch_size):
        batch = image_paths[i:i + batch_size]
        print(f"处理批次 {i//batch_size + 1}/{(len(image_paths)-1)//batch_size + 1}")

        batch_results = model(batch, verbose=False)
        results.extend(batch_results)

    return results

# 使用
image_paths = list(Path("images/").glob("*.jpg"))
results = batch_inference(image_paths, batch_size=16)

并行批处理

from ultralytics import YOLO
import torch.multiprocessing as mp
from pathlib import Path

def process_batch(args):
    """并行处理的工人函数。"""
    image_paths, model_path, device = args
    model = YOLO(model_path)
    model.to(device)

    results = []
    for image_path in image_paths:
        result = model(str(image_path), device=device, verbose=False)
        results.append((image_path, result))

    return results

def parallel_batch_inference(image_paths, model_path="yolov8n.pt", num_workers=4):
    """并行处理图像。"""
    # 将图像分割到工人
    chunk_size = len(image_paths) // num_workers
    chunks = [image_paths[i:i + chunk_size] for i in range(0, len(image_paths), chunk_size)]

    # 准备参数
    args = [(chunk, model_path, i % torch.cuda.device_count())
            for i, chunk in enumerate(chunks)]

    # 并行运行
    with mp.Pool(num_workers) as pool:
        results = pool.map(process_batch, args)

    # 扁平化结果
    all_results = []
    for chunk_results in results:
        all_results.extend(chunk_results)

    return all_results

# 使用
image_paths = list(Path("images/").glob("*.jpg"))
results = parallel_batch_inference(image_paths, num_workers=4)

API集成

FastAPI YOLO服务器

from fastapi import FastAPI, File, UploadFile, HTTPException
from fastapi.responses import JSONResponse
from ultralytics import YOLO
import cv2
import numpy as np
import io
from typing import List
from pydantic import BaseModel

app = FastAPI(title="YOLO检测API")

# 加载模型
model = YOLO("yolov8n.pt")

class Detection(BaseModel):
    class_name: str
    class_id: int
    confidence: float
    bbox: List[float]  # [x1, y1, x2, y2]

class DetectionResponse(BaseModel):
    detections: List[Detection]
    image_width: int
    image_height: int

@app.post("/detect", response_model=DetectionResponse)
async def detect(file: UploadFile = File(...)):
    """检测上传图像中的对象。"""
    try:
        # 读取图像
        contents = await file.read()
        nparr = np.frombuffer(contents, np.uint8)
        image = cv2.imdecode(nparr, cv2.IMREAD_COLOR)

        # 运行推理
        results = model(image, verbose=False)

        # 处理结果
        detections = []
        for result in results:
            for box in result.boxes:
                class_id = int(box.cls[0])
                class_name = model.names[class_id]
                confidence = float(box.conf[0])
                bbox = box.xyxy[0].tolist()

                detections.append(Detection(
                    class_name=class_name,
                    class_id=class_id,
                    confidence=confidence,
                    bbox=bbox
                ))

        return DetectionResponse(
            detections=detections,
            image_width=image.shape[1],
            image_height=image.shape[0]
        )
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.get("/classes")
async def get_classes():
    """获取可用类别。"""
    return model.names

@app.get("/health")
async def health():
    """健康检查。"""
    return {"status": "healthy"}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

客户端使用

import requests

# 上传图像并检测
with open("image.jpg", "rb") as f:
    response = requests.post(
        "http://localhost:8000/detect",
        files={"file": f}
    )

result = response.json()
print(f"找到 {len(result['detections'])} 个对象")

for detection in result["detections"]:
    print(f"{detection['class_name']}: {detection['confidence']:.2f}")

性能优化

模型优化

from ultralytics import YOLO
import torch

# 加载模型
model = YOLO("yolov8n.pt")

# 导出到ONNX以加速推理
model.export(format="onnx", opset=12)

# 导出到TensorRT用于NVIDIA GPU
model.export(format="engine", device=0)

# 导出到CoreML用于Apple设备
model.export(format="coreml")

# 使用半精度(FP16)
model = YOLO("yolov8n.pt")
model.half()  # 转换为FP16

# 使用FP16推理
results = model("image.jpg", half=True)

推理优化

from ultralytics import YOLO

model = YOLO("yolov8n.pt")

# 优化推理设置
results = model(
    "image.jpg",
    imgsz=640,           # 较小图像大小以加速推理
    conf=0.5,            # 较高置信度阈值以减少检测
    max_det=50,          # 限制最大检测数
    half=True,           # 使用FP16
    device="0",          # 使用GPU
    verbose=False,       # 禁用详细输出
    augment=False,        # 禁用增强以加速推理
    agnostic_nms=False,  # 禁用类别无关NMS
    classes=None,        # 过滤特定类别
)

TensorRT优化

from ultralytics import YOLO

# 导出到TensorRT
model = YOLO("yolov8n.pt")
model.export(format="engine", device=0, half=True, workspace=4)

# 使用TensorRT引擎
model = YOLO("yolov8n.engine")
results = model("image.jpg")

后处理结果

结果过滤

from ultralytics import YOLO

model = YOLO("yolov8n.pt")

# 运行推理
results = model("image.jpg")

# 按置信度过滤
filtered_results = []
for result in results:
    for box in result.boxes:
        if float(box.conf[0]) > 0.7:  # 仅保留高置信度
            filtered_results.append(box)

# 按类别过滤
person_boxes = [box for box in result.boxes if int(box.cls[0]) == 0]  # 类别0 = person

# 按大小过滤
large_boxes = []
for box in result.boxes:
    x1, y1, x2, y2 = box.xyxy[0]
    width = x2 - x1
    height = y2 - y1
    if width * height > 10000:  # 仅保留大对象
        large_boxes.append(box)

非极大值抑制(自定义)

import numpy as np

def custom_nms(boxes, scores, iou_threshold=0.45):
    """自定义NMS实现。"""
    if len(boxes) == 0:
        return []

    # 按分数排序
    indices = np.argsort(scores)[::-1]

    keep = []
    while len(indices) > 0:
        # 保留最高分数框
        current = indices[0]
        keep.append(current)

        # 计算与剩余框的IoU
        ious = calculate_iou(boxes[current], boxes[indices[1:]])

        # 移除高IoU的框
        indices = indices[1:][ious < iou_threshold]

    return keep

def calculate_iou(box1, box2):
    """计算两个框之间的IoU。"""
    x1 = max(box1[0], box2[0])
    y1 = max(box1[1], box2[1])
    x2 = min(box1[2], box2[2])
    y2 = min(box1[3], box2[3])

    intersection = max(0, x2 - x1) * max(0, y2 - y1)

    area1 = (box1[2] - box1[0]) * (box1[3] - box1[1])
    area2 = (box2[2] - box2[0]) * (box2[3] - box2[1])

    union = area1 + area2 - intersection

    return intersection / union if union > 0 else 0

结果可视化

from ultralytics import YOLO
import cv2
import numpy as np

model = YOLO("yolov8n.pt")
results = model("image.jpg")

# 自定义可视化
image = cv2.imread("image.jpg")

for result in results:
    for box in result.boxes:
        x1, y1, x2, y2 = map(int, box.xyxy[0])
        class_id = int(box.cls[0])
        class_name = model.names[class_id]
        confidence = float(box.conf[0])

        # 基于类别的颜色
        color = get_color(class_id)

        # 绘制边界框
        cv2.rectangle(image, (x1, y1), (x2, y2), color, 2)

        # 绘制标签
        label = f"{class_name}: {confidence:.2f}"
        (label_width, label_height), _ = cv2.getTextSize(
            label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 2
        )

        cv2.rectangle(image, (x1, y1 - label_height - 10),
                     (x1 + label_width, y1), color, -1)
        cv2.putText(image, label, (x1, y1 - 5),
                   cv2.FONT_HERSHEY_SIMPLEX, 0.5, (255, 255, 255), 2)

cv2.imwrite("custom_result.jpg", image)

def get_color(class_id):
    """为每个类别生成一致的颜色。"""
    np.random.seed(class_id)
    return tuple(np.random.randint(0, 255, 3).tolist())

生产部署

Docker部署

Dockerfile:

FROM python:3.10-slim

# 安装系统依赖
RUN apt-get update && apt-get install -y \
    libgl1-mesa-glx \
    libglib2.0-0 \
    && rm -rf /var/lib/apt/lists/*

# 安装Python依赖
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# 复制应用程序
COPY app.py .
COPY models/ ./models/

# 暴露端口
EXPOSE 8000

# 运行应用程序
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]

requirements.txt:

ultralytics>=8.0.0
fastapi>=0.100.0
uvicorn>=0.23.0
python-multipart>=0.0.6

Kubernetes部署

deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: yolo-api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: yolo-api
  template:
    metadata:
      labels:
        app: yolo-api
    spec:
      containers:
      - name: yolo-api
        image: yolo-api:latest
        ports:
        - containerPort: 8000
        resources:
          limits:
            nvidia.com/gpu: 1
          requests:
            memory: "2Gi"
            cpu: "1"
---
apiVersion: v1
kind: Service
metadata:
  name: yolo-api
spec:
  selector:
    app: yolo-api
  ports:
  - port: 80
    targetPort: 8000
  type: LoadBalancer

生产最佳实践

from ultralytics import YOLO
import torch
import logging

# 配置日志
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class YOLOModel:
    _instance = None
    _model = None

    def __new__(cls, model_path="yolov8n.pt"):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
            cls._model = YOLO(model_path)
            logger.info(f"模型从 {model_path} 加载")
        return cls._instance

    @classmethod
    def predict(cls, image, **kwargs):
        """线程安全预测。"""
        try:
            results = cls._model(image, verbose=False, **kwargs)
            return results
        except Exception as e:
            logger.error(f"预测错误: {e}")
            raise

# 使用 - 单例模式确保模型仅加载一次
model = YOLOModel("yolov8n.pt")
results = model.predict("image.jpg")

最佳实践

模型选择

  1. 根据需求选择模型大小

    • 使用 yolov8n 用于实时应用
    • 使用 yolov8s/m 用于平衡性能
    • 使用 yolov8l/x 用于最大准确性
  2. 使用任务特定模型

    • 使用 yolov8n-seg.pt 用于实例分割
    • 使用 yolov8n-pose.pt 用于姿态估计
    • 使用检测模型仅用于对象检测

训练最佳实践

  1. 数据集准备

    • 使用高质量注释
    • 确保一致图像大小
    • 平衡类别分布
    • 使用数据增强
  2. 训练配置

    • 从预训练权重开始
    • 使用适当学习率
    • 监控验证指标
    • 定期保存检查点
  3. 微调

    • 初始冻结骨干层
    • 使用较低学习率
    • 逐步解冻层
    • 监控过拟合

推理优化

  1. 使用半精度

    model = YOLO("yolov8n.pt")
    results = model("image.jpg", half=True)
    
  2. 优化推理参数

    • 设置适当置信度阈值
    • 限制最大检测数
    • 禁用详细输出
    • 使用GPU加速
  3. 导出优化模型

    • 导出到ONNX用于跨平台部署
    • 导出到TensorRT用于NVIDIA GPU
    • 导出到CoreML用于Apple设备

生产部署

  1. 使用单例模式

    • 启动时加载模型一次
    • 跨请求重用模型
    • 避免每个请求加载模型
  2. 实现错误处理

    • 优雅处理无效输入
    • 适当记录错误
    • 提供有意义错误消息
  3. 监控性能

    • 跟踪推理延迟
    • 监控GPU内存使用
    • 设置性能下降警报
  4. 使用适当硬件

    • 使用GPU用于实时应用
    • 考虑移动部署的边缘设备
    • 为目标平台优化

相关技能