面向跨模态的露天矿障碍物检测方法

刘光伟; 雷健; 郭直清; 柴森霖

doi:10.12438/cst.2025-0004

摘要: 在露天矿开采作业中，安全开采是保障生产流程顺利推进的关键，然而露天矿障碍物检测面临复杂环境带来的严峻挑战。传统视觉算法难以满足露天矿复杂的生产场景需求，因其道路环境复杂，暗光致使道路状况难辨，恶劣天气频发等严重干扰检测精准度，杂乱矿场背景中有各类机械设备、堆积物与起伏地形交织，且存在多种不同类型障碍物，传统算法局限性显著，已有改进算法在露天矿环境下的适应性与鲁棒性也有待提升。为实现精准、高效检测露天矿障碍物，保障矿山运输安全，突破传统算法瓶颈，基于YOLO架构提出跨模态检测模型——Mamba-YOLO-World。该模型采用多模态信息，将文本信息与图像信息相结合来提升检测精度和复杂情况下的障碍物检测能力。引入MambaFusion-PAN架构深度剖析，优化融合多模态数据，精准提取道路障碍关键特征，借助状态空间模型捕捉各模态内在关联，全方位检测复杂的露天矿生产环境；融入金字塔增强网络（PENet），通过拉普拉斯金字塔将输入图像分解为不同分辨率的子图像来捕捉全局语境和边缘分支强化纹理，并大幅提升昏暗光线条件下检测能力；引入文本−图像的组合损失函数推动模型训练精度提升。经筛选整理6 000张图片，同时制作对应文本信息，二者结合构建数据集基础，运用图像增强、降噪、剪裁缩放等技术为试验提供高质量数据支撑。试验将融入PENet的Mamba-YOLO-World模型与YOLOv8x、YOLOv9e等主流算法进行对比测试，结果显示该模型在mAP@50、准确率、召回率等核心指标上优势突出，其中mAP@50为64.8%。深入分析发现，PENet显著提升了暗光检测效果，而组合损失函数则为多模态数据的训练提供有力支持。结果表明，融入PENet的Mamba-YOLO-World 模型虽计算复杂度略高，但精度优势明显，相较于传统算法，通过多模态数据融合增强了模型的适应性与鲁棒性，为露天矿开采提供了一种跨模态的障碍物检测方法。

Abstract: In open-pit mining operations, safe mining is critical to ensuring the smooth progression of production processes. However, obstacle detection in open-pit mines confronts severe challenges arising from complex environments. Traditional visual algorithms fail to meet the demands of complex open-pit mining scenarios, as intricate road conditions, indistinct road status under low light, and frequent occurrences of severe weather severely disrupt detection accuracy. Additionally, cluttered mine backgrounds, characterized by intertwined mechanical equipment, accumulations, and undulating terrain, coupled with the presence of diverse obstacle types, highlight significant limitations of traditional algorithms. The adaptability and robustness of existing improved algorithms in open-pit mine environments also require further enhancement. To achieve accurate and efficient detection of open-pit mine obstacles, ensure mine transportation safety, and overcome the bottlenecks of traditional algorithms, a cross-modal detection model—Mamba-YOLO-World—was proposed based on the YOLO architecture. This model employs multi-modal information, integrating text and image data to enhance detection accuracy and obstacle detection performance under complex conditions. The MambaFusion-PAN architecture was introduced for in-depth analysis to optimize the fusion of multi-modal data, enabling precise extraction of key features from road obstacles. Leveraging a state space model to capture intrinsic inter-modal correlations, the model achieves comprehensive detection of complex open-pit mining environments. Furthermore, PENet was integrated, which decomposes input images into sub-images of varying resolutions via Laplacian pyramids to capture global context, strengthens textures through edge branches, and significantly improves detection capabilities under dim lighting conditions. A combined text-image loss function was also introduced to boost the training accuracy of the model. A dataset was constructed by selecting and organizing 6 000 images, with corresponding text information generated; these were integrated to form the foundational dataset. Techniques such as image enhancement, denoising, cropping, and scaling were applied to provide high-quality data support for the experiments. Comparative tests were conducted between the Mamba-YOLO-World model integrated with PENet and mainstream algorithms including YOLOv8x and YOLOv9e. The results demonstrated that the proposed model exhibited significant advantages in core metrics such as mAP@50, precision, and recall, with an mAP@50 of 64.8%. In-depth analysis revealed that PENet notably improved low-light detection performance, while the combined loss function provided robust support for the training of multi-modal data. The results indicated that although the Mamba-YOLO-World model integrated with PENet exhibits slightly higher computational complexity, its accuracy advantages are evident. Compared with traditional algorithms, it enhances model adaptability and robustness through multi-modal data fusion, thus providing a cross-modal obstacle detection method for open-pit mining.

面向跨模态的露天矿障碍物检测方法

Cross-modal obstacle detection method for open-pit mine