1 篇博文含有标签「图像分割」

YOLO模型：目标检测、图像分割与姿态估计全解析

YOLO（You Only Look Once）是一种广泛使用的目标检测模型，近年来也逐渐应用于图像分割和姿态估计任务。本篇文章将详细讲解YOLO模型在目标检测、图像分割及姿态估计中的应用，通过代码和预测结果分析帮助您更好地理解和使用YOLO模型。

Ultralytics库的所有预测结果都放在Result对象中，适用于目标检测、图像分割和姿态估计等任务，本文也将详细介绍如何处理不同任务的预测结果。

任务概述与对比

YOLO支持三种主要视觉任务，每个任务都有其独特的输出结构和应用场景：

目标检测（Object Detection）
- 输出：边界框（boxes）和类别标签
- 特点：定位物体位置并进行分类
- 应用场景：物体识别、车辆检测、人脸检测等
图像分割（Image Segmentation）
- 输出：像素级别掩码（masks）和类别标签
- 特点：提供物体精确的轮廓信息
- 应用场景：医学图像分析、场景理解等
姿态估计（Pose Estimation）
- 输出：人体关键点坐标（keypoints）和骨架连接
- 特点：识别人体姿态和动作
- 应用场景：运动分析、姿态追踪、行为监控等

YOLO模型的预测结果对象结构

所有任务的预测结果都封装在Results对象中，Results对象包含以下通用属性：

- orig_img: 原始图像数据
- orig_shape: 原始图像尺寸(高, 宽)
- path: 输入图像路径
- save_dir: 结果保存路径
- speed: 预测耗时信息

这些属性帮助我们在不同任务中标准化处理预测结果。

目标检测

目标检测的代码实现

下面的代码演示了如何使用YOLO进行目标检测，识别图像中的物体，并将检测结果（包括边界框和类别标签）绘制在原始图像上。

import os
from ultralytics import YOLO
import cv2
import os
import glob
import shutil

OBJECT_DETECTION_MODEL_PATH = './models/object_detection.onnx'
TASK_NAME = 'detect'

def generate_colors(names):
    colors = {}
    for name in names:
        hash_object = hashlib.md5(name.encode())
        hash_int = int(hash_object.hexdigest(), 16)
        b = (hash_int & 0xFF0000) >> 16
        g = (hash_int & 0x00FF00) >> 8
        r = hash_int & 0x0000FF
        colors[name] = (b, g, r)  # OpenCV 使用 BGR 顺序
    return colors

# 单张图像目标检测预测
def predict_single_image_by_detect(image_path, out_image_file):
    # 获取输出文件`out_image_path`文件所在的目录
    out_dir = os.path.dirname(out_image_path)
    os.makedirs(out_dir, exist_ok=True)

    image_list = [image_path]
    results = model(image_list)

    for result in results:
        boxes = result.boxes
        if boxes is None:
            cv2.imwrite(out_image_file, result.orig_img)
            continue
        boxes_data = boxes.data.cpu().numpy()
        names = result.names
        class_names = list(names.values())

        color_map = generate_colors(class_names)

        img = result.orig_img

        for box in boxes_data:
            x1, y1, x2, y2, score, class_id = box
            x1, y1, x2, y2 = map(int, [x1, y1, x2, y2])
            class_name = names[int(class_id)]
            color = color_map[class_name]
            cv2.rectangle(img, (x1, y1), (x2, y2), color, 2)
            label = f'{class_name} {score:.2f}'
            cv2.putText(img, label, (x1, max(y1 - 10, 0)),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.9, color, 2)
        print(f"图像写入路径: {out_image_file}")
        cv2.imwrite(out_image_file, img)

if __name__ == '__main__':
    # 预测单张图像
    image_path = 'bus.jpg'
    out_image_path = image_path + '_predicted.jpg'
    predict_single_image_by_detect(image_path, out_image_path)

目标检测结果分析

在目标检测任务中，Results对象中最重要的字段是：

boxes：包含边界框的坐标、置信度和类别ID。
names：类别标签映射。
orig_img：原始图像数据。

每个边界框包含以下六个值：

[x1, y1, x2, y2, score, class_id]
# x1, y1: 左上角坐标
# x2, y2: 右下角坐标
# score: 检测置信度
# class_id: 类别ID

图像分割

图像分割的代码实现

图像分割任务比目标检测更加精细，它不仅需要识别物体的类别，还要提取每个物体的准确轮廓。

import os
import hashlib
import cv2
import numpy as np
from ultralytics import YOLO
import glob
import shutil

SEGMENT_MODEL_PATH = "./models/segmentation.onnx"
TASK_NAME = 'segment'
model = YOLO(SEGMENT_MODEL_PATH, task=TASK_NAME)

# 单张图像的分割模型预测函数
def predict_single_image_by_segment(image_path, out_image_path):
    out_dir = os.path.dirname(out_image_path)
    os.makedirs(out_dir, exist_ok=True)

    results = model.predict(source=image_path)

    for result in results:
        if result.masks is None:
            cv2.imwrite(out_image_path, result.orig_img)
            continue
        masks = result.masks.data.cpu().numpy()
        boxes = result.boxes.data.cpu().numpy()
        label_map = result.names
        color_map = generate_colors(label_map.values())

        img_with_masks = result.orig_img.copy()

        for i, mask in enumerate(masks):
            mask = mask.astype(np.uint8)
            mask = cv2.resize(mask, (result.orig_shape[1], result.orig_shape[0]))

            color = np.random.randint(0, 255, (3,), dtype=np.uint8)
            colored_mask = np.zeros_like(result.orig_img, dtype=np.uint8)
            colored_mask[mask > 0] = color

            img_with_masks = cv2.addWeighted(img_with_masks, 1, colored_mask, 0.5, 0)

            box_data = boxes[i]
            x1, y1, x2, y2 = map(int, box_data[:4])
            class_name = label_map[int(box_data[5])]
            score = box_data[4]
            cv2.rectangle(img_with_masks, (x1, y1), (x2, y2), color_map[class_name], 2)
            label = f"{class_name}: {score:.4f}"
            cv2.putText(img_with_masks, label, (x1, y1 - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)

        cv2.imwrite(out_image_path, img_with_masks)
        print(f"Prediction saved to {out_image_path}")

if __name__ == '__main__':
    image_path = 'bus.jpg'
    out_image_path = image_path + '_segmented.jpg'
    predict_single_image_by_segment(image_path, out_image_path)

图像分割结果分析

Results对象的特有字段：

masks：实例分割掩码数据。
boxes：边界框信息。
names：类别标签映射。

掩码数据为二值化图像，需调整到与原图相同的尺寸，并与原图叠加进行可视化。

姿态估计

姿态估计的代码实现

姿态估计的目标是检测人体的关键点，并根据关键点绘制出人体骨架。

import cv2
from ultralytics import YOLO
import os

POSE_MODEL_PATH = './models/pose.onnx'
TASK_NAME = 'pose'
model = YOLO(POSE_MODEL_PATH, task=TASK_NAME)

def predict_single_image_by_pose(image_path, out_image_path):
    out_dir = os.path.dirname(out_image_path)
    os.makedirs(out_dir, exist_ok=True)

    results = model.predict(source=image_path)

    for result in results:
        if result.keypoints is None:
            continue
        if result.boxes is None:
            continue

        orig_img = result.orig_img
        keypoints = result.keypoints.data.cpu().numpy()
        boxes = result.boxes.data.cpu().numpy()

        for box_data, kpts in zip(boxes, keypoints):
            for keypoint in kpts:
                x, y, score = keypoint
                cv2.circle(orig_img, (int(x), int(y)), 3, (255, 0, 0), -1)

            for connection in skeleton:
                part_a, part_b = connection
                if kpts[part_a][2] > 0.5 and kpts[part_b][2] > 0.5:
                    x1, y1 = int(kpts[part_a][0]), int(kpts[part_a][1])
                    x2, y2 = int(kpts[part_b][0]), int(kpts[part_b][1])
                    cv2.line(orig_img, (x1, y1), (x2, y2), (0, 255, 255), 1)

        cv2.imwrite(out_image_path, orig_img)

if __name__ == '__main__':
    image_path = 'bus.jpg'
    out_image_path = image_path + '_posed.jpg'
    predict_single_image_by_pose(image_path, out_image_path)

姿态估计结果分析

keypoints：包含人体关键点坐标和置信度。
boxes：人体检测框。
names：通常为'person'类别。

每个关键点包含以下数据结构：

[x, y, confidence]  # 每个关键点包含坐标和置信度

实践建议

数据预处理
- 确保输入图像尺寸适合模型。
- 检查图像格式（OpenCV通常使用BGR格式）。
- 视需要进行图像增强。
结果处理注意事项
- 始终进行空值检查。
- 将tensor数据转换为numpy格式。
- 坐标值转换为整数，确保OpenCV兼容性。
性能优化
- 尽量批量处理图像以提高效率。
- 使用GPU加速推理过程。
- 根据实际需求选择合适的模型大小。
可视化建议
- 为不同类别分配固定颜色，以便更好区分。
- 调整线条的粗细和标签字体大小，保持预测结果可读性。

总结

YOLO在目标检测、图像分割和姿态估计三大任务中的表现令人印象深刻，模型的高度通用性使其成为计算机视觉领域中的热门选择。

数据结构差异
- 目标检测：处理boxes数据。
- 图像分割：同时处理masks和boxes。
- 姿态估计：处理关键点（keypoints）和骨架结构。
应用场景
- 目标检测：适用于物体定位和分类。
- 图像分割：适用于精确轮廓分析。
- 姿态估计：适用于人体动作追踪与行为分析。
通用处理流程
- 模型加载与初始化。
- 数据预处理。
- 结果处理与可视化。
- 错误与异常检查。

任务概述与对比​

YOLO模型的预测结果对象结构​

目标检测​

目标检测的代码实现​

目标检测结果分析​

图像分割​

图像分割的代码实现​

图像分割结果分析​

姿态估计​

姿态估计的代码实现​

姿态估计结果分析​

实践建议​

总结​