高级检索

基于代理注意力的井下煤岩图像Swin-UNet识别方法

Underground coal-rock image recognition using Swin-UNet with agent attention mechanism

  • 摘要: 针对煤矿井下复杂工况导致的煤岩图像低照度、高噪声与运动模糊等分割难题,提出一种融合代理注意力(Agent Attention)机制的改进型Swin-UNet(滑动窗口Transformer U-Net)煤岩图像语义分割模型——Agent Swin-UNet。模型以Swin Transformer为主干网络,利用其层级式窗口自注意力(W-MSA/SW-MSA)机制建立长程光照依赖关系,有效缓解暗区细节丢失与局部特征退化问题。在编解码器跳跃连接中嵌入代理注意力模块,该模块融合三重协同机制,通过引入代理令牌实现“聚合−广播”式特征交互,将计算复杂度由O(N2)降至O(Nn),在保留全局语义建模能力的同时显著提升计算效率;结合空间感知偏置增强模型对噪声分布的自适应能力,有效抑制非结构化干扰;进一步耦合深度可分离卷积(DWC)以强化局部纹理重建,提升边界识别精度与细节恢复能力。针对煤岩图像中前景与背景像素比例严重失衡的问题,构建了融合交叉熵损失、Dice损失与多尺度结构相似性损失(MS-SSIM)的复合监督函数,从分类一致性、区域重叠度与结构相似性多维度优化模型训练过程,增强其在类别不平衡场景下的语义连贯性与边缘完整性。在“陕−晋−冀构造煤数据集”上的系统试验表明:Agent Swin-UNet在标准测试集上达到了91.26% mIoU与88.81% mPA的分割精度,优于Segmenter、DeepLabv3及基准Swin-UNet模型;在噪声强度为0.05的干扰环境下,其mIoU指标仍保持在84.14%,显示出优越的噪声鲁棒性;消融试验进一步验证,代理注意力模块是高性能提升的关键来源,尤其在高噪声场景(>0.05)中贡献了核心性能增益。为复杂地下环境中的煤岩快速分割和智能掘进提供了一种稳健的技术解决方案。

     

    Abstract: To address the challenges of coal-rock image segmentation under complex underground mining conditions—such as low illumination, high noise, and motion blur—this paper proposes an improved semantic segmentation model named Agent Swin-UNet, which integrates an Agent Attention mechanism into the Swin-UNet (Sliding Window Transformer U-Net) framework. The model adopts Swin Transformer as the backbone network, leveraging its hierarchical Window Multi-Head Self-Attention (W-MSA/SW-MSA) mechanism to establish long-range illumination dependencies, thereby alleviating detail loss in dark regions and degradation of local features. An Agent Attention Module is embedded into the skip connections between the encoder and decoder. This module introduces a triple-cooperative mechanism that employs agent tokens to realize an “aggregation-broadcast” style of feature interaction, reducing computational complexity from O(N2) to O(Nn) while preserving global semantic modeling capability and significantly improving computational efficiency. By incorporating spatially-aware bias, the model enhances its adaptability to noise distribution and effectively suppresses unstructured interference, while the integration of depthwise separable convolution (DWC) strengthens local texture reconstruction, improving boundary delineation and fine-detail recovery. To mitigate the severe foreground-background imbalance inherent in coal-rock imagery, a composite loss function combining cross-entropy, Dice, and multi-scale structural similarity (MS-SSIM) losses is designed. This hybrid supervision optimizes the training process from multiple perspectives—classification consistency, regional overlap, and structural similarity—enhancing semantic coherence and boundary completeness under class-imbalance conditions. Experiments on the Shaanxi-Shanxi-Hebei Structural Coal Dataset demonstrate that Agent Swin-UNet achieves 91.26% mIoU and 88.81% mPA on the standard test set, outperforming Segmenter, DeepLabv3, and the baseline Swin-UNet. Under noise interference with an intensity of 0.05, its mIoU remains 84.14%, indicating excellent noise robustness. Ablation studies further confirm that the Agent Attention Module is the principal source of performance improvement, particularly in high-noise environments (> 0.05). The proposed method provides a robust and efficient solution for rapid coal-rock segmentation and intelligent excavation in complex underground environments.

     

/

返回文章
返回