基于自注意力与匹配优化的煤矿井下人员跟踪方法

马天; 石炜璐; 苟娜娜; 毛清华; 牛雅莉; 李远成

doi:10.12438/cst.2025-0936

基于自注意力与匹配优化的煤矿井下人员跟踪方法

Coal mine underground personnel tracking method based on self-attention and matching optimization

摘要

摘要: 针对现有人员跟踪方法在煤矿井下复杂环境中存在的特征稳定性差、外观判别能力不足和轨迹匹配不准确等问题，提出了一种基于自注意力与匹配优化的煤矿井下人员跟踪方法。该方法基于Transformer的编码器−解码器结构，通过帧级特征提取、自注意力编码、查询实体解码及预测映射4个步骤，实现对目标边界框、类别和身份ID的端到端在线跟踪。首先，在特征提取阶段设计了自适应双域协同模块，通过通道自适应加权机制和空间感知加权机制的协同作用，动态调整不同通道和空间位置的权重分布，增强特征图对目标细微差异的辨别能力；其次，在特征提取和编码器之间设计了渐进式下采样注意力融合模块，采用多层次特征融合策略，通过多层次特征的逐元素相加，在增强高层特征表达能力的同时保留低层细节信息，进而精确捕捉到煤矿昏暗场景下目标的边缘细节和位置变化；最后，在跟踪目标与预测框的匹配中，改进了传统的匈牙利匹配算法，采用局部最优的匹配策略，通过动态调整成本矩阵和优化匹配策略，有效降低误匹配率并缓解了跟踪漂移现象，进而提升了煤矿人员跟踪的准确性。实验结果表明，提出的方法在自建煤矿数据集和公共数据集上均取得了更好的效果，指标多目标跟踪准确率（multiple object tracking accuracy，MOTA）和识别F1分数（identification f1-score，IDF1）分别达到了86.4%和77%；相较于Trackformer网络，MOTA和IDF1分别提高了4.2和5.4个百分点。

Abstract: To address the limitations of existing methods, such as poor feature stability, inadequate appearance discrimination capabilities, and suboptimal adaptability of matching correlation mechanisms, arising from the complex underground coal mine environment characterised by lighting variations, dust, and frequent occlusions among visually similar personnel, a personnel tracking method for underground coal mines based on self-attention and matching optimisation is proposed. Based on the Transformer’s encoder-decoder structure, the technique enables end-to-end online target tracking bounding boxes, categories, and identity IDs in four steps: frame-level feature extraction, self-attention encoding, query entity decoding, and prediction mapping. First, an adaptive dual-domain synergy module is designed during feature extraction, dynamically adjusting the weight distribution of different channels and spatial locations through a channel-adaptive weighting mechanism and a spatial-aware weighting mechanism. This enhances the feature map’s ability to discriminate subtle target differences. Then, a progressive downsampling attention fusion module is designed between feature extraction and the encoder. This employs a multi-level feature fusion strategy. Through element-wise summation of multi-level features, high-level feature expression capability is enhanced while low-level detail information is retained. This enables accurate capture of the edge details and positional changes of the target in dim coal mine scenes. Finally, in the matching process between the tracked target and the prediction frame, the traditional Hungarian matching algorithm is improved and a locally optimal matching strategy is adopted. This strategy is optimized by dynamically adjusting the cost matrix, which efficiently reduces the mismatching rate and alleviates the tracking drift phenomenon. This improves the accuracy of tracking coal miners. The experimental results demonstrate that the proposed method outperforms the Trackformer network on both the self-constructed coal mine dataset and the public dataset. The metrics multiple object tracking accuracy (MOTA) and identification f1-score (IDF1) achieve values of 86.4% and 77%, respectively. Compared to the Trackformer network, there was an improvement of 4.2 and 5.4 percentage points in MOTA and IDF1, respectively.

HTML全文

参考文献(32)

施引文献

资源附件(0)