name: 计算机视觉 description: 利用PyTorch和TensorFlow实现包括图像分类、目标检测、分割和姿态估计在内的计算机视觉任务
计算机视觉
概览
计算机视觉使机器能够从图像和视频中理解视觉信息,为自动驾驶、医学成像和监控等应用提供支持。
使用场景
- 图像分类和目标识别任务
- 图像中的目标检测和定位
- 语义或实例分割项目
- 姿态估计和人体活动识别
- 面部识别和生物特征系统
- 医学成像分析和诊断
计算机视觉任务
- 图像分类:将图像归类到不同的类别
- 目标检测:在图像中定位和分类目标
- 语义分割:像素级分类
- 实例分割:检测单个目标实例
- 姿态估计:识别人体关节
- 面部识别:在图像中识别个体
流行架构
- 分类:ResNet, VGG, EfficientNet, Vision Transformer
- 检测:YOLO, Faster R-CNN, SSD, RetinaNet
- 分割:U-Net, DeepLab, Mask R-CNN
- 姿态:OpenPose, PoseNet, HRNet
Python实现
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from PIL import Image, ImageDraw
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset
from torchvision import transforms, models, datasets
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import cv2
from sklearn.metrics import accuracy_score, confusion_matrix
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
print("=== 1. 图像分类CNN ===")
# 定义图像分类模型
class ImageClassifierCNN(nn.Module):
def __init__(self, num_classes=10):
super().__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.BatchNorm2d(32),
nn.MaxPool2d(2, 2),
nn.Conv2d(32, 64, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.BatchNorm2d(64),
nn.MaxPool2d(2, 2),
nn.Conv2d(64, 128, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.BatchNorm2d(128),
nn.MaxPool2d(2, 2),
)
self.classifier = nn.Sequential(
nn.Linear(128 * 4 * 4, 256),
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(256, num_classes)
)
def forward(self, x):
x = self.features(x)
x = x.view(x.size(0), -1)
x = self.classifier(x)
return x
model = ImageClassifierCNN(num_classes=10)
print(f"模型参数: {sum(p.numel() for p in model.parameters()):,}")
# 2. 目标检测设置
print("
=== 2. 目标检测框架 ===")
class ObjectDetector(nn.Module):
def __init__(self):
super().__init__()
# 骨干网络
self.backbone = nn.Sequential(
nn.Conv2d(3, 32, 3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(32, 64, 3, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2),
)
# 边界框回归
self.bbox_head = nn.Sequential(
nn.Linear(64 * 8 * 8, 128),
nn.ReLU(),
nn.Linear(128, 4) # x, y, w, h
)
# 类别预测
self.class_head = nn.Sequential(
nn.Linear(64 * 8 * 8, 128),
nn.ReLU(),
nn.Linear(128, 10) # 10个类别
)
def forward(self, x):
features = self.backbone(x)
features_flat = features.view(features.size(0), -1)
bboxes = self.bbox_head(features_flat)
classes = self.class_head(features_flat)
return bboxes, classes
detector = ObjectDetector()
print(f"检测器参数: {sum(p.numel() for p in detector.parameters()):,}")
# 3. 语义分割
print("
=== 3. 语义分割U-Net ===")
class UNet(nn.Module):
def __init__(self, num_classes=5):
super().__init__()
# 编码器
self.enc1 = self._conv_block(3, 32)
self.pool1 = nn.MaxPool2d(2, 2)
self.enc2 = self._conv_block(32, 64)
self.pool2 = nn.MaxPool2d(2, 2)
# 瓶颈
self.bottleneck = self._conv_block(64, 128)
# 解码器
self.upconv2 = nn.ConvTranspose2d(128, 64, 2, stride=2)
self.dec2 = self._conv_block(128, 64)
self.upconv1 = nn.ConvTranspose2d(64, 32, 2, stride=2)
self.dec1 = self._conv_block(64, 32)
# 最终输出
self.out = nn.Conv2d(32, num_classes, 1)
def _conv_block(self, in_channels, out_channels):
return nn.Sequential(
nn.Conv2d(in_channels, out_channels, 3, padding=1),
nn.ReLU(inplace=True),
nn.Conv2d(out_channels, out_channels, 3, padding=1),
nn.ReLU(inplace=True)
)
def forward(self, x):
enc1 = self.enc1(x)
enc2 = self.enc2(self.pool1(enc1))
bottleneck = self.bottleneck(self.pool2(enc2))
dec2 = self.dec2(torch.cat([self.upconv2(bottleneck), enc2], 1))
dec1 = self.dec1(torch.cat([self.upconv1(dec2), enc1], 1))
return self.out(dec1)
unet = UNet(num_classes=5)
print(f"U-Net参数: {sum(p.numel() for p in unet.parameters()):,}")
# 4. 迁移学习
print("
=== 4. 迁移学习与预训练模型 ===")
try:
# 加载预训练的ResNet18
pretrained_model = models.resnet18(pretrained=True)
num_ftrs = pretrained_model.fc.in_features
pretrained_model.fc = nn.Linear(num_ftrs, 10)
print(f"预训练的ResNet18适应10个类别")
print(f"参数: {sum(p.numel() for p in pretrained_model.parameters()):,}")
except:
print("预训练模型不可用")
# 5. 图像预处理和增强
print("
=== 5. 图像预处理和增强 ===")
transform_basic = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
transform_augmented = transforms.Compose([
transforms.RandomRotation(20),
transforms.RandomHorizontalFlip(),
transforms.ColorJitter(brightness=0.2, contrast=0.2),
transforms.RandomAffine(degrees=0, translate=(0.1, 0.1)),
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
print("增强变换定义")
# 6. 合成图像数据
print("
=== 6. 合成图像数据创建 ===")
def create_synthetic_images(num_images=100, img_size=32):
"""创建带有形状的合成图像"""
images = []
labels = []
for _ in range(num_images):
img = np.ones((img_size, img_size, 3)) * 255
# 随机绘制形状
shape_type = np.random.randint(0, 3)
if shape_type == 0: # 圆形
center = (np.random.randint(5, img_size-5), np.random.randint(5, img_size-5))
radius = np.random.randint(3, 10)
cv2.circle(img, center, radius, (0, 0, 0), -1)
labels.append(0)
elif shape_type == 1: # 矩形
pt1 = (np.random.randint(0, img_size-10), np.random.randint(0, img_size-10))
pt2 = (pt1[0] + np.random.randint(5, 15), pt1[1] + np.random.randint(5, 15))
cv2.rectangle(img, pt1, pt2, (0, 0, 0), -1)
labels.append(1)
else: # 三角形
pts = np.array([[np.random.randint(0, img_size), np.random.randint(0, img_size)],
[np.random.randint(0, img_size), np.random.randint(0, img_size)],
[np.random.randint(0, img_size), np.random.randint(0, img_size)]])
cv2.drawContours(img, [pts], 0, (0, 0, 0), -1)
labels.append(2)
images.append(img.astype(np.float32) / 255.0)
return np.array(images), np.array(labels)
X_images, y_labels = create_synthetic_images(num_images=300, img_size=32)
print(f"合成数据集: {X_images.shape}, 标签: {y_labels.shape}")
print(f"类别分布: {np.bincount(y_labels)}")
# 7. 可视化
print("
=== 7. 可视化 ===")
fig, axes = plt.subplots(3, 3, figsize=(12, 10))
# 显示合成图像
for i in range(9):
idx = i % len(X_images)
axes[i // 3, i % 3].imshow(X_images[idx])
axes[i // 3, i % 3].set_title(f"类别 {y_labels[idx]}")
axes[i // 3, i % 3].axis('off')
plt.suptitle("合成图像数据集", fontsize=14, fontweight='bold')
plt.tight_layout()
plt.savefig('synthetic_images.png', dpi=100, bbox_inches='tight')
print("合成图像保存为 'synthetic_images.png'")
# 8. 模型架构比较
print("
=== 8. 架构比较 ===")
architectures_info = {
'CNN': ImageClassifierCNN(),
'ObjectDetector': ObjectDetector(),
'U-Net': UNet(),
}
arch_data = {
'架构': list(architectures_info.keys()),
'参数': [sum(p.numel() for p in m.parameters()) for m in architectures_info.values()],
'用例': ['分类', '目标检测', '分割']
}
arch_df = pd.DataFrame(arch_data)
print("
架构比较:")
print(arch_df.to_string(index=False))
# 可视化
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# 参数比较
axes[0].barh(arch_df['架构'], arch_df['参数'], color='steelblue')
axes[0].set_xlabel('参数数量')
axes[0].set_title('模型复杂度比较')
axes[0].set_xscale('log')
# 用例
use_cases = ['分类', '检测', '分割',
'分类', '检测', '分割']
colors_map = {'分类': 'green', '检测': 'orange', '分割': 'red'}
bar_colors = [colors_map[uc] for uc in arch_df['用例']]
axes[1].bar(arch_df['架构'], [1, 1, 1], color=bar_colors, alpha=0.7)
axes[1].set_ylabel('主要任务')
axes[1].set_title('架构用例')
axes[1].set_ylim([0, 1.5])
plt.tight_layout()
plt.savefig('cv_architecture_comparison.png', dpi=100, bbox_inches='tight')
print("
架构比较保存为 'cv_architecture_comparison.png'")
# 9. 边界框可视化
print("
=== 9. 边界框可视化 ===")
fig, ax = plt.subplots(figsize=(10, 8))
ax.imshow(X_images[0])
# 绘制样本边界框
bboxes = [
(5, 5, 15, 15), # x1, y1, x2, y2
(18, 10, 28, 20),
(8, 20, 18, 28)
]
for bbox in bboxes:
rect = patches.Rectangle((bbox[0], bbox[1]), bbox[2]-bbox[0], bbox[3]-bbox[1],
linewidth=2, edgecolor='red', facecolor='none')
ax.add_patch(rect)
ax.set_title('边界框检测示例')
ax.axis('off')
plt.savefig('bounding_boxes.png', dpi=100, bbox_inches='tight')
print("边界框可视化保存为 'bounding_boxes.png'")
print("
计算机视觉设置完成!")
常见CV架构
- 分类:ResNet, EfficientNet, Vision Transformer
- 检测:YOLO v5, Faster R-CNN, RetinaNet
- 分割:U-Net, DeepLab v3, Mask R-CNN
- 跟踪:SORT, DeepSORT, ByteTrack
图像预处理
- 调整到标准尺寸
- 使用ImageNet统计数据进行归一化
- 数据增强(旋转、翻转、裁剪)
- 颜色空间转换
评估指标
- 分类:准确率、精确度、召回率、F1
- 检测:mAP(平均精度均值)、IoU
- 分割:IoU、Dice系数、Hausdorff距离
交付物
- 训练好的视觉模型
- 推理管道
- 性能评估
- 可视化结果
- 模型优化报告
- 部署指南