Abstract:
There are problems such as interference information, low illumination and mechanical equipment occlusion in the complex environment of underground coal mine, which makes the speed and accuracy of the existing object detection algorithm have a series of challenges when carrying out the task of personnel unsafe action detection. An improved YOLOv8l method, called MAC-YOLO, is proposed to solve the problems of complex computation, large number of parameters, long inference time and difficult feature extraction in existing object detection models. By replacing the convolution in the original baseline model with the receptive field attention convolution (RFAConv), the MAC-YOLO model allows the model to dynamically adjust the receptive field weight according to the complexity and importance of the input data, and solve the parameter sharing problem in the standard convolution operation, so that the network can more effectively capture and utilize the information in the image. At the same time, an efficient multi scale attention (EMA) module is introduced into the baseline model, which can integrate context information of different scales, and learn effective channel descriptions without channel dimensionality reduction during convolution operation, so that the model can produce better pixel-level attention to high-level feature maps. It also can capture the inter-dimensional interaction and establish the dependency between dimensions, so that the huge local receptor field of neurons can efficiently obtain clearer multi scale features, reduce the influence of interference factors in the image, further improve the focusing ability of the model on object features, and help the model to efficiently carry out convolutional operations to extract abnormal actions of personnel in the coal mine. Improve the detection accuracy of the model. In addition, the boundary box regression loss function (L_\mathrmMPDIoU) is introduced to directly minimize the distance between the upper left point and the lower right point of the predicted box and the real box, which solves the problem that the model cannot be optimized effectively when the original loss function has the same aspect ratio of the predicted box and the real box (the value is different), accelerates the convergence speed of the model and improves the positioning accuracy. In order to reduce the computational complexity of the model, the complexity of the network structure, and enhance the flexibility of the network, it uses the slim-neck design paradigm to transform the neck of the baseline model, enhances the ability to handle network characteristics through the GSbottleneck module, and improves the learning ability of the model through GSConv module stacking. The VoV-GSCSP module improves the feature utilization efficiency and network performance. The experimental results show that in the scenario-specific coal miner action dataset (MACD), compared with the baseline model YOLOv8l, the mAP@0.5 and mAP@0.5:0.95 of MAC-YOLO increased by 1.9% and 3.6%, respectively, and the FPS value is 81ms. This shows that the MAC-YOLO model meets the needs of real-time and lightweight models while maintaining good detection accuracy, demonstrating high flexibility, accuracy and efficiency. In addition, the effectiveness of each improved module to improve the performance of the model is proved by the ablation experiments.