基于改进U-net模型的液体介入煤矸双模态图像融合分割方法研究

张锦旺; 李依琪; 何庚; 王景政

doi:10.12438/cst.2025-1056

基于改进U-net模型的液体介入煤矸双模态图像融合分割方法研究

Research on dual-modal image fusion segmentation method of liquid intervention coal gangue based on improved U-net model

摘要

摘要: 为了解决高粉尘环境下灰度差异较小的难辨识煤矸种类识别精度低、误判率高的问题，提出一种基于改进U-Net模型(DSSA-Net)的液体介入双模态图像融合分割方法。该方法创新性地引入液体介入手段，通过喷洒液体以增大煤矸表面热辐射差异，从而增强红外图像对难辨识煤矸目标区域的判别能力；构建包含3 072组难辨识煤矸红外–可见光图像对的数据集；所设计的DSSA-Net模型采用双分支编码器结构，分别提取液体介入增强后的红外热辐射特征与可见光纹理细节特征，借助多尺度层级融合及双模态注意力融合模块(DAFM)实现跨模态特征自适应融合，进一步采用双路径监督解码器优化特征聚合过程，通过高层语义监督注入浅层网络缓解梯度消失问题并提升边缘细节恢复能力。研究了液体介入手段在提升难辨识煤矸红外图像判别能力方面的作用，以及DSSA-Net模型在双模态图像融合分割中的性能表现。实验结果表明：基于液体介入后的双模态图像融合的分割效果最优，在测试集上像素准确率(PA)达到91.21%、均交并比(mIoU)达到79.62%、平均Dice指数(mDice)达到88.49%，各项指标均显著优于单一模态输入、液体未介入的双模态融合以及采用简单卷积融合的基线模型。与TransUNet、Swin-Unet等主流Transformer架构模型相比，DSSA-Net模型在复杂场景下的目标精细化分割任务中表现出更强的适用性，煤类别的IoU和Dice系数分别达到78.47%和87.94%，矸石类别则分别达到71.95%和83.68%。所提方法有效提升了难辨识煤矸的高精度识别与分割能力，为解决此类复杂环境下的煤矸识别问题提供了有效途径。

Abstract: To address the issues of low recognition accuracy and high misjudgment rate in identifying difficult-to-distinguish coal-gangue types with small grayscale differences in high-dust environments, a liquid-intervention bimodal image fusion and segmentation method based on an improved U-Net model (DSSA-Net) is proposed. This method innovatively introduces a liquid-intervention approach, where liquid is uniformly sprayed to increase the thermal radiation differences on the surface of coal gangue, thereby enhancing the discriminative ability of infrared images for the target regions of difficult-to-distinguish coal gangue. A dataset comprising 3 072 pairs of infrared-visible images of difficult-to-distinguish coal gangue is constructed. The designed DSSA-Net model adopts a dual-branch encoder structure to separately extract infrared thermal radiation features and visible texture detail features enhanced by liquid intervention. It achieves adaptive cross-modal feature fusion using a multi-scale hierarchical fusion and bimodal attention fusion module (DAFM). Furthermore, a dual-path supervised decoder is employed to optimize the feature aggregation process, alleviating the gradient vanishing problem and improving edge detail recovery by injecting high-level semantic supervision into shallow networks. The role of the liquid-intervention approach in enhancing the discriminative ability of infrared images for difficult-to-distinguish coal gangue and the application of the DSSA-Net model in bimodal image fusion and segmentation are investigated. The experimental results indicate that the segmentation performance based on liquid-intervention bimodal image fusion is optimal, achieving a pixel accuracy (PA) of 91.21%, a mean intersection-over-union (mIoU) of 79.62%, and a mean Dice coefficient (mDice) of 88.49% on the test set. These results significantly outperform those of single-modal input, bimodal fusion without liquid intervention, and the baseline model using simple convolutional fusion. Compared with the mainstream Transformer architecture models such as TransUNet and Swin-Unet, the DSSA-Net model shows stronger applicability in the target fine segmentation task in complex scenes. The IoU and Dice coefficients of the coal category reach 78.47 % and 87.94 %, respectively, and the gangue category reaches 71.95 % and 83.68 %. The proposed method effectively improves the high-precision recognition and segmentation capabilities of difficult-to-distinguish coal gangue, providing an effective approach for addressing coal-gangue recognition problems in such complex environments.

HTML全文

参考文献(32)

施引文献

资源附件(0)