nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo journalinfonormal searchdiv searchzone qikanlogo popupnotification paper paperNew
2025, 05, v.55 928-937
基于多尺度特征融合的复杂交通场景目标检测算法
基金项目(Foundation): 国家自然科学基金(12373032)~~
邮箱(Email):
DOI:
发布时间: 2025-02-25
出版时间: 2025-02-25
网络发布时间: 2025-02-25
移动端阅读
摘要:

道路场景中的物体检测对于智能交通系统和自动驾驶至关重要,但复杂的交通条件带来了重大挑战。针对现有检测算法存在的目标尺度的多样性、周围背景的干扰导致误检和漏检以及遮挡导致的精度下降等问题,提出了一种基于多尺度特征融合(Multi-Scale Feature Fusion, MSFF)的自动驾驶目标检测算法。在Backbone网络中构建了C2f-RepViT模块,从而生成更具表现力的特征表示;主干经过MSFF模块优化,精准捕捉图像细节与上下文信息;在Neck层中设计了特征双向扩散金字塔网络(Feature Bidirectional Diffusion Pyramid Network, FBDPN)结构,以显著提升MSFF的效果;引入PIoU(Powerful-IoU),提升了锚框的质量评估能力,加快模型的收敛速度并提高准确率。在KITTI数据集上的实验结果表明,相较于原YOLOv8算法,所提出检测算法的准确率提高了2.2%,召回率提高了1.9%,mAP@0.5提高了2.1%,mAP@0.5:0.95提高了1.6%,在自动驾驶场景中取得了更好的检测精度和效果。

Abstract:

Object detection in road scenes is crucial for intelligent transportation systems and autonomous driving, but complex traffic conditions pose significant challenges. Aiming at the problems of existing detection algorithms such as the diversity of object scales, the interference of the surrounding background leading to misdetection and omission, as well as the accuracy degradation due to occlusion, etc., a Multi-Scale Feature Fusion(MSFF)-based object detection algorithm for autonomous driving is proposed. Firstly, the C2f-RepViT module is constructed in the Backbone network to generate more expressive feature representations, in addition, the backbone is optimized by the MSFF module to capture the image details and contextual information accurately. Secondly, the Feature Bidirectional Diffusion Pyramid Network(FBDPN) structure is designed in the Neck layer to improve the effect of MSFF significantly. Finally, the PIoU(Powerful-IoU) is introduced to enhance the anchor box quality evaluation capability, accelerating the convergence speed of the model and improving the accuracy. The experimental results on the KITTI dataset show that compared with the original YOLOv8 algorithm, the proposed detection algorithm improves the precision by 2.2%, recall by 1.9%, mAP@0.5 by 2.1%, and mAP@0.5:0.95 by 1.6%, which proves that the proposed algorithm achieves a better detection accuracy and effect in autonomous driving scenarios.

参考文献

[1] GIRSHICK R,DONAHUE J,DARRELL T,et al.Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C]//IEEE Conference on Computer Vision and Pattern Recognition.Columbus:IEEE,2014:580-587.

[2] HE K M,ZHANG X Y,REN S Q,et al.Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2015,37(9):1904-1916.

[3] GIRSHICK R.Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision (ICCV).Santiago:IEEE,2015:1440-1448.

[4] REN S Q,HE K M,GIRSHICK R,et al.Faster R-CNN:Towards Real-time Object Detection with Region Proposal Networks[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2017,39(6):1137-1149.

[5] HE K M,GKIOXARI G,DOLLáR P,et al.Mask R-CNN[C]//2017 IEEE International Conference on Computer Vision (ICCV).Venice:IEEE,2017:2961-2969.

[6] REDMON J,DIVVALA S,GIRSHICK R.You Only Look Once:Unified,Real-time Object Detection[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Las Vegas:IEEE,2016:779-788.

[7] REDMON J,FARHADI A.YOLO9000:Better,Faster,Stronger[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Honolulu:IEEE,2017:6517-6525.

[8] REDMON J,FARHADI A.YOLOv3:An Incremental Improvement[EB/OL].(2018-04-08)[2024-11-10].http://arxiv.org/pdf/1804.02767.

[9] BOCHKOVSKIY A,WANG C Y,LIAO H Y M.YOLOv4:Optimal Speed and Accuracy of Object Detection[EB/OL].(2020-04-23)[2024-11-10].https://arxiv.org/abs/2004.10934.

[10] JOCHER G,CHAURASIA A,STOKEN A,et al.Ultralytics/YOLOv5 in PyTorch[EB/OL].(2020-06-09)[2024-11-10].https://github.com/ultralytics/yolov5.

[11] GE Z,LIU S T,WANG F,et al.YOLOX:Exceeding YOLO series in 2021[EB/OL].(2021-07-18)[2024-11-10].https://arxiv.org/abs/2107.08430.

[12] LI C Y,LI L L,JIANG K H,et al.YOLOv6:A Single-stage Object Detection Framework for Industrial Applications[EB/OL].(2022-07-07)[2024-11-10].https://arxiv.org/abs/2209.02976.

[13] WANG C Y,BOCHKOVSKIY A,LIAO H Y M.YOLOv7:Trainable Bag-of-freebies Sets New State-of-the-art for Real-time Object Detectors[C]//2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Vancouver:IEEE,2023:7464-7475.

[14] JOCHER G,CHAURASIA A,STOKEN A,et al.Ultralytics/YOLOv8 in PyTorch[EB/OL].(2023-01-10)[2024-11-10].https://github.com/ultralytics/ultralytics.

[15] WANG A,CHEN H,LIU L H,et al.YOLOv10:Real-time End-to-end Object Detection[EB/OL].(2024-05-23)[2024-11-10].https://arxiv.org/abs/2405.14458.

[16] LIU W,ANGUELOV D,ERHAN D,et al.SSD:Single Shot Multibox Detector[C]//Computer Vision-ECCV 2016.Amsterdam:Springer,2016:21-37.

[17] LIN T Y,GOYAL P,GIRSHICK R,et al.Focal Loss for Dense Object Detection[C]//2017 IEEE International Conference on Computer Vision (ICCV).Venice:IEEE,2017:2999-3007.

[18] 周晴,谭功全,尹宋麟,等.轻量化网络模型的道路目标检测算法[J].无线电工程,2023,53(3):601-610.

[19] 雷帮军,余翱,余快.基于 YOLOv8s 改进的小目标检测算法[J].无线电工程,2024,54(4):857-870.

[20] LIN T Y,DOLLáR P,GIRSHICK R,et al.Feature Pyramid Networks for Object Detection[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Honolulu:IEEE,2017:936-944.

[21] LIU S,QI L,QIN H F,et al.Path Aggregation Network for Instance Segmentation[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:8759-8768.

[22] KONG T,SUN F C,LIU H,P et al.FoveaBox:Beyond Anchor-based Object Detection[J].IEEE Transactions on Image Processing,2020,29:7389-7398.

[23] WANG A,CHEN H,LIN Z J,et al.RepViT:Revisiting Mobile CNN from ViT Perspective[C]//2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).Seattle:IEEE,2024:15909-15920.

[24] HU J,SHEN L,SUN G.Squeeze-and-Excitation Networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.Salt Lake City:IEEE,2018:7132-7141.

[25] HENDRYCKS D,GIMPEL K.Gaussian Error Linear Units (GELUs)[EB/OL].(2016-06-27)[2024-11-10].https://arxiv.org/abs/1606.08415.

基本信息:

中图分类号:U463.6;U495;TP391.41

引用信息:

[1]董善,陈清江.基于多尺度特征融合的复杂交通场景目标检测算法[J].无线电工程,2025,55(05):928-937.

基金信息:

国家自然科学基金(12373032)~~

发布时间:

2025-02-25

出版时间:

2025-02-25

网络发布时间:

2025-02-25

检 索 高级检索