基于YOLOv8的煤矿井下人员不安全动作检测算法

陈伟; 江志成; 田子建; 张帆; 刘毅

doi:10.12438/cst.2023-1772

基于YOLOv8的煤矿井下人员不安全动作检测算法

Unsafe action detection algorithm of underground personnel in coal mine based on YOLOv8

摘要

摘要: 煤矿井下复杂环境中存在干扰信息、低照明度以及机械设备遮挡等问题，使得现有的目标检测算法在进行人员异常动作检测任务时，算法的速度和精度都存在一系列挑战。为解决现有的目标检测模型计算复杂、参数量大、推理时间长以及特征提取困难等问题，提出了一种改进的YOLOv8l方法，称为MAC-YOLO。MAC-YOLO模型通过替换原有基线模型中的卷积为感受野注意卷积（RFAConv），允许模型可以根据输入数据复杂性和重要性动态调整感受野权重，解决标准卷积运算中的参数共享问题，使得网络可以更有效地捕捉和利用图像中的信息。同时在基线模型中引入了高效多尺度注意力（EMA）模块，能够融合不同尺度的上下文信息，且在卷积运算时不进行通道降维的情况下学习到有效的通道描述，使模型能够对高级特征图产生更好的像素级关注。该模型也能捕获跨维度交互并建立维度之间的依赖关系，使得神经元巨大的局部感受野能高效获得更清晰的多尺度特征，降低了图像中干扰因素的影响，进一步提升了模型对目标特征的聚焦能力，有助于模型高效地进行卷积操作提取煤矿井下人员的异常动作，提高模型的检测精度。此外，引入边界框回归的损失函数（L_\mathrmMPDIoU），直接最小化预测框和真实框左上点和右下点之间的距离，解决了原有损失函数存在预测框和真实框长宽比相同（值不同）时模型无法有效优化的问题，加快了模型的收敛速度同时提升定位精度。在降低模型计算复杂性、网络结构复杂性以及增强网络灵活性方面，使用了slim-neck设计范式对基线模型的颈部进行改造，通过GSbottleneck模块增强网络处理特征的能力，利用 GSConv模块堆叠提高模型的学习能力，其中的VoV-GSCSP模块提高了特征利用效率和网络性能。实验结果显示，在特定场景的煤矿工人动作数据集（MACD）上，相比基线模型YOLOv8l，MAC-YOLO的mAP@0.5和mAP@0.5:0.95分别提升了1.9%和3.6%，且FPS值为81 ms。这表明MAC-YOLO模型在保持良好检测精度的同时，也满足了实时性和轻量化模型的需求，展示了高灵活性、准确性和效率。此外，还通过消融实验证明了各个改进模块对提升模型性能的有效性。

Abstract: There are problems such as interference information, low illumination and mechanical equipment occlusion in the complex environment of underground coal mine, which makes the speed and accuracy of the existing object detection algorithm have a series of challenges when carrying out the task of personnel unsafe action detection. An improved YOLOv8l method, called MAC-YOLO, is proposed to solve the problems of complex computation, large number of parameters, long inference time and difficult feature extraction in existing object detection models. By replacing the convolution in the original baseline model with the receptive field attention convolution (RFAConv), the MAC-YOLO model allows the model to dynamically adjust the receptive field weight according to the complexity and importance of the input data, and solve the parameter sharing problem in the standard convolution operation, so that the network can more effectively capture and utilize the information in the image. At the same time, an efficient multi scale attention (EMA) module is introduced into the baseline model, which can integrate context information of different scales, and learn effective channel descriptions without channel dimensionality reduction during convolution operation, so that the model can produce better pixel-level attention to high-level feature maps. It also can capture the inter-dimensional interaction and establish the dependency between dimensions, so that the huge local receptor field of neurons can efficiently obtain clearer multi scale features, reduce the influence of interference factors in the image, further improve the focusing ability of the model on object features, and help the model to efficiently carry out convolutional operations to extract abnormal actions of personnel in the coal mine. Improve the detection accuracy of the model. In addition, the boundary box regression loss function (L_\mathrmMPDIoU) is introduced to directly minimize the distance between the upper left point and the lower right point of the predicted box and the real box, which solves the problem that the model cannot be optimized effectively when the original loss function has the same aspect ratio of the predicted box and the real box (the value is different), accelerates the convergence speed of the model and improves the positioning accuracy. In order to reduce the computational complexity of the model, the complexity of the network structure, and enhance the flexibility of the network, it uses the slim-neck design paradigm to transform the neck of the baseline model, enhances the ability to handle network characteristics through the GSbottleneck module, and improves the learning ability of the model through GSConv module stacking. The VoV-GSCSP module improves the feature utilization efficiency and network performance. The experimental results show that in the scenario-specific coal miner action dataset (MACD), compared with the baseline model YOLOv8l, the mAP@0.5 and mAP@0.5:0.95 of MAC-YOLO increased by 1.9% and 3.6%, respectively, and the FPS value is 81ms. This shows that the MAC-YOLO model meets the needs of real-time and lightweight models while maintaining good detection accuracy, demonstrating high flexibility, accuracy and efficiency. In addition, the effectiveness of each improved module to improve the performance of the model is proved by the ablation experiments.

HTML全文

参考文献(27)

施引文献

资源附件(0)