高级检索

面向跨模态的露天矿障碍物检测方法

Cross-modal obstacle detection method for open-pit mine

  • 摘要: 在露天矿开采作业中,安全开采是保障生产流程顺利推进的关键,然而露天矿障碍物检测面临复杂环境带来的严峻挑战。传统视觉算法难以满足露天矿复杂的生产场景需求,因其道路环境复杂,暗光致使道路状况难辨,恶劣天气频发等严重干扰检测精准度,杂乱矿场背景中有各类机械设备、堆积物与起伏地形交织,且存在多种不同类型障碍物,传统算法局限性显著,已有改进算法在露天矿环境下的适应性与鲁棒性也有待提升。为实现精准、高效检测露天矿障碍物,保障矿山运输安全,突破传统算法瓶颈,基于YOLO架构提出跨模态检测模型——Mamba-YOLO-World。该模型采用多模态信息,将文本信息与图像信息相结合来提升检测精度和复杂情况下的障碍物检测。引入MambaFusion-PAN架构深度剖析,优化融合多模态数据,精准提取道路障碍关键特征,借助状态空间模型捕捉各模态内在关联,全方位检测复杂的露天矿生产环境;并且融入金字塔增强网络(PENet),通过拉普拉斯金字塔将输入图像分解为不同分辨率的子图像来捕捉全局语境和边缘分支强化纹理,并大幅提升昏暗光线条件下检测能力;引入文本−图像的组合损失函数推动模型训练精度提升。经筛选整理6 000张图片,同时制作对应文本信息,二者结合构建数据集基础,运用图像增强、降噪、剪裁缩放等技术为试验提供高质量数据支撑。试验融入PENet的Mamba-YOLO-World模型与YOLOv8x、YOLOv9e等主流算法进行对比测试,结果显示该模型在mPA50、准确率、召回率等核心指标上优势突出,其中mAP50为64.8%。深入分析发现,PENet显著提升暗光检测效果,组合损失函数赋能多模态数据训练。融入PENet的Mamba-YOLO-World 模型虽计算复杂度略高,但精度优势明显,相较于传统算法,通过多模态数据融合增强了模型的适应性与鲁棒性,为露天开采提供了一种跨模态的障碍物检测方法。

     

    Abstract: In open-pit mining operations, safe mining is the key to ensuring the smooth progress of production processes. However, obstacle detection in open-pit mines is confronted with severe challenges brought by complex environments. Traditional visual algorithms are difficult to meet the requirements of complex production scenarios in open-pit mines. The reasons include: complex road environments, difficulty in distinguishing road conditions due to low light, and frequent harsh weather—all of which seriously interfere with detection accuracy; in the cluttered mine background, various mechanical equipment, piles of materials, and undulating terrain are intertwined, and there are multiple types of obstacles. These factors make the limitations of traditional algorithms prominent, and the adaptability and robustness of existing improved algorithms in open-pit mine environments still need to be enhanced.To achieve accurate and efficient detection of obstacles in open-pit mines, ensure the safety of mine transportation, and break through the bottlenecks of traditional algorithms, a cross-modal detection model—Mamba-YOLO-World—is proposed based on the YOLO architecture. This model adopts multimodal information and combines text information with image information to improve detection accuracy and obstacle detection performance under complex conditions.It introduces the MambaFusion-PAN architecture for in-depth analysis, optimizes the fusion of multimodal data, and accurately extracts key features of road obstacles. By means of a state-space model, it captures the intrinsic correlations between various modalities to achieve comprehensive detection of the complex production environment in open-pit mines.In addition, it integrates PENet: through the Laplacian pyramid, the input image is decomposed into sub-images with different resolutions to capture the global context, and edge branches are used to enhance textures—greatly improving the detection capability under dim light conditions. Meanwhile, a text-image combined loss function is introduced to promote the improvement of model training accuracy.After screening and organizing 6 000 images, corresponding text information was created at the same time; the combination of the two forms the foundation of the dataset. Technologies such as image enhancement, noise reduction, cropping, and scaling are applied to provide high-quality data support for the experiment.In the experiment, the Mamba-YOLO-World model integrated with PENet was tested in comparison with mainstream algorithms such as YOLOv8x and YOLOv9e. The results show that this model has prominent advantages in core indicators including mPA50%, accuracy, and recall rate, among which the mPA50% reaches 64.8%.In-depth analysis reveals that PENet significantly improves the low-light detection effect, and the combined loss function empowers the training of multimodal data. Finally, the conclusion is drawn: although the Mamba-YOLO-World model integrated with PENet has slightly higher computational complexity, it has obvious accuracy advantages. Compared with traditional algorithms, it enhances the model’s adaptability and robustness through multimodal data fusion, providing a cross-modal obstacle detection method for open-pit mining.

     

/

返回文章
返回