Advance Search
LIU Guangwei,LEI Jian,GUO Zhiqing,et al. Cross-modal obstacle detection method for open-pit mine[J]. Coal Science and Technology,2025,53(11):327−340. DOI: 10.12438/cst.2025-0004
Citation: LIU Guangwei,LEI Jian,GUO Zhiqing,et al. Cross-modal obstacle detection method for open-pit mine[J]. Coal Science and Technology,2025,53(11):327−340. DOI: 10.12438/cst.2025-0004

Cross-modal obstacle detection method for open-pit mine

  • In open-pit mining operations, safe mining is critical to ensuring the smooth progression of production processes. However, obstacle detection in open-pit mines confronts severe challenges arising from complex environments. Traditional visual algorithms fail to meet the demands of complex open-pit mining scenarios, as intricate road conditions, indistinct road status under low light, and frequent occurrences of severe weather severely disrupt detection accuracy. Additionally, cluttered mine backgrounds, characterized by intertwined mechanical equipment, accumulations, and undulating terrain, coupled with the presence of diverse obstacle types, highlight significant limitations of traditional algorithms. The adaptability and robustness of existing improved algorithms in open-pit mine environments also require further enhancement. To achieve accurate and efficient detection of open-pit mine obstacles, ensure mine transportation safety, and overcome the bottlenecks of traditional algorithms, a cross-modal detection model—Mamba-YOLO-World—was proposed based on the YOLO architecture. This model employs multi-modal information, integrating text and image data to enhance detection accuracy and obstacle detection performance under complex conditions. The MambaFusion-PAN architecture was introduced for in-depth analysis to optimize the fusion of multi-modal data, enabling precise extraction of key features from road obstacles. Leveraging a state space model to capture intrinsic inter-modal correlations, the model achieves comprehensive detection of complex open-pit mining environments. Furthermore, PENet was integrated, which decomposes input images into sub-images of varying resolutions via Laplacian pyramids to capture global context, strengthens textures through edge branches, and significantly improves detection capabilities under dim lighting conditions. A combined text-image loss function was also introduced to boost the training accuracy of the model. A dataset was constructed by selecting and organizing 6 000 images, with corresponding text information generated; these were integrated to form the foundational dataset. Techniques such as image enhancement, denoising, cropping, and scaling were applied to provide high-quality data support for the experiments. Comparative tests were conducted between the Mamba-YOLO-World model integrated with PENet and mainstream algorithms including YOLOv8x and YOLOv9e. The results demonstrated that the proposed model exhibited significant advantages in core metrics such as mAP@50, precision, and recall, with an mAP@50 of 64.8%. In-depth analysis revealed that PENet notably improved low-light detection performance, while the combined loss function provided robust support for the training of multi-modal data. The results indicated that although the Mamba-YOLO-World model integrated with PENet exhibits slightly higher computational complexity, its accuracy advantages are evident. Compared with traditional algorithms, it enhances model adaptability and robustness through multi-modal data fusion, thus providing a cross-modal obstacle detection method for open-pit mining.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return