基于SGV-YOLOv8模型的机械零件智能识别与抓取方法

doi:10.3969/j.issn.1004-132X.2026.02.019

摘要/Abstract

摘要：

针对工业机器人抓取机械零件过程中零件识别速度慢、抓取成功率低等问题，提出了一种基于SGV-YOLOv8模型的机械零件智能识别与抓取方法。采用单目相机和激光测距模块构建深度视觉检测装置，实现机械零件三维定位；将YOLOv8模型作为基本架构，在骨干网络使用StarNet网络替换原有结构，并在颈部引入GSConv模块和VoV-GSCSP结构，实现了降低模型复杂程度的同时提高检测速度和抓取率。实验结果表明，与原模型相比，设计的SGV-YOLOv8模型（StarNet-GSConv-VoV YOLOv8）的模型参数量和浮点运算数（GFLOPs）分别下降了51.9%和51%，而每秒检测帧数（FPS）提高了37.6%；构建的工业机器人抓取装置的零件抓取成功率为80%。

关键词: 机械臂抓取, 机器视觉, 激光测距模块, YOLOv8模型, 零件识别

Abstract:

To solve the problems of slow part identification and low success rate in grabbing mechanical parts by industrial robots， an intelligent part identification and grabbing method was proposed based on SGV-YOLOv8 model. The monocular camera and laser ranging module were used to build a depth vision detection device to realize the three-dimensional positioning of mechanical parts； Taking the YOLOv8 model as the basic architecture， StarNet network was used in the backbone network to replace the original structure， and GSConv module and VoV-GSCSP structure were introduced in the neck， so as to reduce the complexity of the model and improve the detection speed and capture rate. The experimental results show that compared with the original model， the number of model parameters and the number of floating point operations （GFLOPs） of the designed SGV-YOLOv8 increases 51.9% and 51% respectively， while the number of detection frames per second （FPS） increases 37.6%； The success rate of part grasping in the constructed industrial robot grasping devices is 80%.

Key words: mechanical arm grab bing, machine vision, laser ranging module, YOLOv8 model, part identification

中图分类号:

TP241.2

罗杭, 杨晔, 陈本永. 基于SGV-YOLOv8模型的机械零件智能识别与抓取方法[J]. 中国机械工程, 2026, 37(2): 442-451.

LUO Hang, YANG Ye, CHEN Benyong. Intelligent Part Identification and Grabbing Method Based on SGV-YOLOv8 Model[J]. China Mechanical Engineering, 2026, 37(2): 442-451.

导出引用管理器 EndNote|Ris|BibTeX

链接本文: https://www.cmemo.org.cn/CN/10.3969/j.issn.1004-132X.2026.02.019

https://www.cmemo.org.cn/CN/Y2026/V37/I2/442

图/表 16

图1 零件智能识别与抓取系统示意图

Fig.1 Schematic diagram of intelligent part identification and grabbing system

图2 激光测距模块和相机的标定示意图

Fig.2 Calibration diagram of laser ranging module and camera

图3 SGV-YOLOv8网络结构

Fig.3 SGV-YOLOv8 network architecture

图4 StarNet网络结构

Fig.4 StarNet network architecture

图5 VoV-GSCSP模块及GSConv模块结构

Fig.5 Architecture of VoV-GSCSP module and GSConv module

图6 数据集的可视化

Fig.6 Visualization of data sets

表 1 模型超参数设置

Tab.1 Model super parameter setting

网络配置项	结构参数
Epochs	150
Ir0	0.01
Optimizer	SGD
Momentum	0.937
Weight decay	0.0005
Batch size	64

表2 Self Parts零件数据集在不同算法上的性能比较

Tab.2 Performance comparison of part datasets on different algorithms

模型	参数规模/MB	GFLOPs/G	推理速度/ （帧·s $- 1$ ）	mAP@0.5/%
Faster R-CNN	140.8	406.6	10	83.1
SSD	50.2	360.9	107	75.7
YOLOv5	13.8	15.9	11	98.8
YOLOv6s	8.3	11.8	263	98.0
YOLOv8m	49.6	79.1	208	99.2
YOLOv8n	6.0	8.9	164	99.0
YOLOv8s	21.4	28.8	303	98.9

表2 Self Parts零件数据集在不同算法上的性能比较

Tab.2 Performance comparison of part datasets on different algorithms

模型	参数规模/MB	GFLOPs/G	推理速度/ （帧·s $- 1$ ）	mAP@0.5/%
Faster R-CNN	140.8	406.6	10	83.1
SSD	50.2	360.9	107	75.7
YOLOv5	13.8	15.9	11	98.8
YOLOv6s	8.3	11.8	263	98.0
YOLOv8m	49.6	79.1	208	99.2
YOLOv8n	6.0	8.9	164	99.0
YOLOv8s	21.4	28.8	303	98.9

表3 不同骨干网络的比较

Tab.3 Comparison of different backbone networks

模型	参数规模/MB	GFLOPs/G	mAP@0.5/%
MobileNet	11.2	22.6	98.5
ShuffleNet	12.4	17.4	98.7
GhostNet	12.4	17.3	98.6
FasterNet	16.7	21.7	99.0
StarNet	11.1	17.3	98.7

表4 YOLOv8的消融实验结果

Tab.4 Ablation test results of YOLOv8

	原始 YOLOv8 网络	YOLOv8 网络+ StarNet	YOLOv8 网络+ GSConv	YOLOv8 网络+ VoV-GSCSP	YOLOv8 网络+ StarNet+ GSConv	YOLOv8 网络+ StarNet+ VoV-GSCSP	YOLOv8 网络+ GSConv+ VoV-GSCSP	原始YOLOv8 网络+ StarNet+ GSConv+ VoV-GSCSP
YOLOv8	√	√	√	√	√	√	√	√
StarNet		√			√	√		√
GSConv			√		√		√	√
VoV-GSCSP				√		√	√	√
参数规模/MB	21.4	11.1	5.81	19.3	12.0	12.7	19.9	11.1
GFLOPs/G	28.8	17.3	26.2	21.3	16.9	17.3	25.1	14.1
推理速度/ （帧·s $- 1$ ）	303.4	384.6	277.5	286	323.3	344.9	293.7	417.2
mAP@0.5/%	98.9	98.7	99.0	98.9	98.5	98.7	99.2	98.9

表4 YOLOv8的消融实验结果

Tab.4 Ablation test results of YOLOv8

	原始 YOLOv8 网络	YOLOv8 网络+ StarNet	YOLOv8 网络+ GSConv	YOLOv8 网络+ VoV-GSCSP	YOLOv8 网络+ StarNet+ GSConv	YOLOv8 网络+ StarNet+ VoV-GSCSP	YOLOv8 网络+ GSConv+ VoV-GSCSP	原始YOLOv8 网络+ StarNet+ GSConv+ VoV-GSCSP
YOLOv8	√	√	√	√	√	√	√	√
StarNet		√			√	√		√
GSConv			√		√		√	√
VoV-GSCSP				√		√	√	√
参数规模/MB	21.4	11.1	5.81	19.3	12.0	12.7	19.9	11.1
GFLOPs/G	28.8	17.3	26.2	21.3	16.9	17.3	25.1	14.1
推理速度/ （帧·s $- 1$ ）	303.4	384.6	277.5	286	323.3	344.9	293.7	417.2
mAP@0.5/%	98.9	98.7	99.0	98.9	98.5	98.7	99.2	98.9

图7 消融实验结果散点图

Fig.7 Visualization of data sets

图8 YOLOv8和SGV-YOLOv8网络模型的结果对比

Fig.8 Comparison of results between YOLOv8 and SGV-YOLOv8 network

图9 不同模型对所选测试集检测对比

Fig.9 Comparison of different models for detection of selected test set

表5 Industrial Tool数据集上的泛化实验

Tab.5 Generalization experiments on industrial tool datasets

模型	参数规模/MB	GFLOPs/ G	推理速度 FPS/（帧·s $- 1$ ）	mAP@0.5/%	mAP@0.5：0.95/%
YOLOv5	13.8	15.9	21.2	99.2	76.9
YOLOv6s	8.3	11.8	266.0	98.7	74.4
YOLOv8n	6.0	8.9	128.2	99.0	77.9
YOLOv8s	21.4	28.8	312.5	98.9	76.8
YOLOv8m	49.6	79.1	288.9	99.1	77.4
SGV-YOLOv8	11.1	14.1	344.8	99.2	78.6

表5 Industrial Tool数据集上的泛化实验

Tab.5 Generalization experiments on industrial tool datasets

模型	参数规模/MB	GFLOPs/ G	推理速度 FPS/（帧·s $- 1$ ）	mAP@0.5/%	mAP@0.5：0.95/%
YOLOv5	13.8	15.9	21.2	99.2	76.9
YOLOv6s	8.3	11.8	266.0	98.7	74.4
YOLOv8n	6.0	8.9	128.2	99.0	77.9
YOLOv8s	21.4	28.8	312.5	98.9	76.8
YOLOv8m	49.6	79.1	288.9	99.1	77.4
SGV-YOLOv8	11.1	14.1	344.8	99.2	78.6

图10 机械臂实物图

Fig.10 Physical image of mechanical arm

表6 基于改进YOLOv8的机械臂零件抓取结果

Tab.6 Robot arm part grab results based on improved YOLO8

模型	试验次数	定位失败的零件数量	识别错误的零件数量	成功抓取次数	成功率/%
YOLOv8	30	7	2	21	70
SGV-YOLOv8	30	6	0	24	80

参考文献 19

[1]	谢丰隆，韩建海，李向攀. 一种快速的机器人固定视觉标定方法［J］. 机械设计与制造，2018（11）： 237-240.
	XIE Fenglong， HAN Jianhai， LI Xiangpan. A Fast Way of Stable Camera Calibration with Robot［J］. Machinery Design & Manufacture， 2018（11）：237-240.
[2]	那一鸣，胡超，邱业余，等. 基于机器视觉的汽车车门三维定位引导［J］. 中国机械工程， 2024， 35（9）： 1677-1687.
	NA Yiming， HU Chao， QIU Yeyu， et al. Three-dimensional Positioning Guidance of Automobile Doors Based on Machine Vision ［J］. China Mechanical Engineering， 2024， 35（9）： 1677-1687.
[3]	NAKAGUCHI V M， LIU Zifu， et al. 3D Camera and Single-point Laser Sensor Integration for Apple Localization in Spindle-type Orchard Systems［J］. Sensors， 2024， 24（12）： 3753.
[4]	LUHMANN T， FRASER C， MAAS H G. Sensor Modelling and Camera Calibration for Close-range Photogrammetry［J］. ISPRS Journal of Photogrammetry and Remote Sensing， 2016， 115： 37-46.
[5]	LIU Zewei， LU Dongming， QIAN Weixian， et al. Calibration of a Single-point Laser Range Finder and a Camera［J］. Optical and Quantum Electronics， 2018， 50（12）： 447.
[6]	PATEL S N， REKIMOTO J， ABOWD G D. ICam： Precise At-a-distance Interaction in the Physical Environment［C］∥Pervasive Computing. Berlin， 2006： 272-287.
[7]	WITHER J， COFFIN C， VENTURA J， et al. Fast Annotation and Modeling with a Single-point Laser Range Finder［C］∥2008 7th IEEE/ACM International Symposium on Mixed and Augmented Reality. Cambridge， 2008： 65-68.
[8]	吕张成，张建业，陈哲钥，等. 基于深度学习的工业零件识别与抓取实时检测算法［J］. 机床与液压， 2023， 51（24）： 33-38.
	Zhangcheng LYU， ZHANG Jianye， CHEN Zheyao， et al. Real-time Detection Algorithm for Industrial Parts Recognition and Grabbing Based on Deep Learning ［J］. Machine Tool & Hydraulics， 2023， 51（24）： 33-38.
[9]	HINTON G E， SALAKHUTDINOV R R. Reducing the Dimensionality of Data with Neural Networks［J］. Science， 2006， 313（5786）： 504-507.
[10]	GIRSHICK R， DONAHUE J， DARRELL T， et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation［C］∥2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus， 2014： 580-587.
[11]	HE Kaiming， ZHANG Xiangyu， REN Shaoqing， et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition［C］∥Computer Vision – ECCV 2014. Cham， 2014： 346-361.
[12]	GIRSHICK R. Fast R-CNN［C］∥2015 IEEE International Conference on Computer Vision （ICCV）. Santiago， 2015： 1440-1448.
[13]	REN Shaoqing， HE Kaiming， GIRSHICK R， et al. Faster R-CNN： Towards Real-time Object Detection with Region Proposal Networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）： 1137-1149.
[14]	DAI J， LI Y， HE K， et al. R⁃FCN： Object Detection via Region⁃based Fully Convolutional Network［C］∥30th Conference on Neural Information Processing Systems. Barcelona， 2016：379-387.
[15]	黎洲，黄妙华. 基于YOLO_v2模型的车辆实时检测［J］.中国机械工程， 2018， 29（15）： 1869-1874.
	LI Zhou， HUANG Miaohua. Vehicle Detections Based on YOLO_v2 in Real-time ［J］. China Mechanical Engineering， 2018， 29（15）： 1869-1874.
[16]	REDMON J， DIVVALA S， GIRSHICK R， et al. You Only Look Once： Unified， Real-time Object Detection［C］∥2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Las Vegas， 2016： 779-788.
[17]	LIU Wei， ANGUELOV D， ERHAN D， et al. SSD： Single Shot MultiBox Detector［C］∥Computer Vision–ECCV 2016. Cham， 2016： 21-37.
[18]	MA Xu， DAI Xiyang， BAI Yue， et al. Rewrite the Stars［C］∥2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Seattle， 2024： 5694-5703.
[19]	LI H， LI J， WEI H， et al. Slim-neck by GSConv： a Better Design Paradigm of Detector Architectures for Autonomous Vehicles［J］. arXiv Preprint arXiv：， 2022.