Abstract:
To address the issues of low recognition accuracy and high misjudgment rate in identifying difficult-to-distinguish coal-gangue types with small grayscale differences in high-dust environments, a liquid-intervention bimodal image fusion and segmentation method based on an improved U-Net model (DSSA-Net) is proposed. This method innovatively introduces a liquid-intervention approach, where liquid is uniformly sprayed to increase the thermal radiation differences on the surface of coal gangue, thereby enhancing the discriminative ability of infrared images for the target regions of difficult-to-distinguish coal gangue. A dataset comprising 3 072 pairs of infrared-visible images of difficult-to-distinguish coal gangue is constructed. The designed DSSA-Net model adopts a dual-branch encoder structure to separately extract infrared thermal radiation features and visible texture detail features enhanced by liquid intervention. It achieves adaptive cross-modal feature fusion using a multi-scale hierarchical fusion and bimodal attention fusion module (DAFM). Furthermore, a dual-path supervised decoder is employed to optimize the feature aggregation process, alleviating the gradient vanishing problem and improving edge detail recovery by injecting high-level semantic supervision into shallow networks. The role of the liquid-intervention approach in enhancing the discriminative ability of infrared images for difficult-to-distinguish coal gangue and the application of the DSSA-Net model in bimodal image fusion and segmentation are investigated. The experimental results indicate that the segmentation performance based on liquid-intervention bimodal image fusion is optimal, achieving a pixel accuracy (PA) of 91.21%, a mean intersection-over-union (mIoU) of 79.62%, and a mean Dice coefficient (mDice) of 88.49% on the test set. These results significantly outperform those of single-modal input, bimodal fusion without liquid intervention, and the baseline model using simple convolutional fusion. Compared with the mainstream Transformer architecture models such as TransUNet and Swin-Unet, the DSSA-Net model shows stronger applicability in the target fine segmentation task in complex scenes. The IoU and Dice coefficients of the coal category reach 78.47 % and 87.94 %, respectively, and the gangue category reaches 71.95 % and 83.68 %. The proposed method effectively improves the high-precision recognition and segmentation capabilities of difficult-to-distinguish coal gangue, providing an effective approach for addressing coal-gangue recognition problems in such complex environments.