高级检索

融合多模态大模型的固体充填采煤井下分选煤矸下落瞬态图像识别方法

Recognition method of transient falling images of coal and gangue separation in underground solid backfill mining with multimodal large language models

  • 摘要: 固体充填采煤作为一种兼顾资源回收与生态保护的绿色开采方法,其核心环节煤矸分选是井下采选充一体化技术高效运行的前提,而煤矸识别作为实现煤矸精准分选的关键技术,面临着井下复杂工况中特征提取困难、边界定位模糊等挑战。为此,以固体充填采煤井下煤矸分选为研究背景,提出一种融合多模态大模型(Multimodal Large Language Model,MLLM)的固体充填采煤井下分选煤矸下落瞬态图像识别方法。首先,自主设计并搭建固体充填采煤井下分选煤矸下落瞬态图像采集实验平台,以模拟井下低照度、高粉尘的复杂工况,利用高速相机采集不同工况下的煤矸下落瞬态图像;对采集的煤矸图像进行预处理,运用优化算法提升低照度图像亮度并改善粉尘环境图像质量,同时进行标注和数据扩充,构建用于煤矸识别模型训练和测试的数据集;其次,针对传统SegFormer模型在煤矸图像边界识别中的缺陷,引入高效通道注意力机制(ECA)并优化损失函数,构建ECSegFormer模型;进一步提出将MLLM融合到ECSegFormer模型,形成MLLM-ECSegFormer煤矸识别模型融合架构,利用多模态大模型Qwen-VL(7B)提取煤矸目标中心坐标,通过高斯热图生成空间注意力掩膜,分阶段融入ECSegFormer编码器,实现多模态先验知识与图像特征的动态交互。试验结果表明,融合多模态大模型后,各经典图像识别模型性能均显著提升。其中,MLLM-ECSegFormer的MIoU提升至95.50%、MPA提升至98.92%、准确率提升至98.87%,在识别精度、模型复杂程度和识别效率方面均显著优于经典图像识别模型,且与其他图像识别模型相比,MLLM-ECSegFormer在复杂工况下的边缘识别连续性更强,尤其在粉尘干扰、煤矸形态不规则场景中,对目标区域的分割精度显著优于传统模型。研究成果为煤矸精准识别提供了新方法,提升了固体充填采煤技术的智能化水平,对煤炭资源的绿色智能开采具有重要意义。

     

    Abstract: Solid backfill coal mining, as a green mining method that balances resource recovery and ecological protection, relies on coal and gangue separation as a core process for the efficient operation of integrated underground mining, selection, and backfill technology. However, coal and gangue identification, as a key technology for precise coal and gangue separation, faces challenges such as difficulties in feature extraction and vague boundary positioning in the complex underground working conditions. To address this, a method for recognizing the transient falling images of coal and gangue separation in underground solid backfill coal mining using Multimodal Large Language Models (MLLM) was proposed, with the underground coal and gangue separation in solid backfill coal mining as the research background. First, an experimental platform for capturing transient falling images of coal and gangue separation in underground solid backfill coal mining was independently designed and built to simulate the complex underground conditions of low illumination and high dust. High-speed cameras were used to capture transient falling images of coal and gangue under different conditions. The collected images were preprocessed using optimized algorithms to enhance the brightness of low-illumination images and improve the quality of images in dusty environments. The images were then annotated and augmented to construct a dataset for training and testing coal and gangue identification models. Subsequently, to address the shortcomings of the traditional SegFormer model in boundary recognition of coal and gangue images, an ECA was introduced and the loss function was optimized to construct the ECSegFormer model. Furthermore, MLLM was integrated into the ECSegFormer model to form the MLLM-ECSegFormer architecture. The MLLM Qwen-VL(7B) was used to extract the center coordinates of coal and gangue targets, and a spatial attention mask was generated through a Gaussian heatmap, which was then incorporated into the ECSegFormer encoder in stages to achieve dynamic interaction between multimodal prior knowledge and image features. The experimental results showed that after the integration of the multimodal large language model, the performance of all classical image recognition models was significantly improved. Specifically, the MLLM-ECSegFormer achieved an MIoU of 95.50%, an MPA of 98.92%, and an accuracy rate of 98.87%, significantly outperforming classical image recognition models in terms of recognition accuracy, model complexity, and recognition efficiency. Compared with classical image recognition models, the MLLM-ECSegFormer demonstrated stronger edge recognition continuity under complex conditions. Particularly in scenarios with dust interference and irregular shapes of coal and gangue, the segmentation accuracy of the target area was significantly better than that of traditional models. The research findings provided a new method for precise identification of coal and gangue, enhance the intelligence level of solid backfill coal mining technology, and are of great significance for the green and intelligent mining of coal resources.

     

/

返回文章
返回