高级检索

面向原煤分选场景的多模态融合异物开集检测方法

Multimodal fusion method for foreign object open-set detection in raw coal sorting scenario

  • 摘要: 原煤分选过程首先需要对大块矸石、铁丝、编织袋等异物进行识别与拣选,以避免对后续工艺环节造成影响或引发安全事故。目前煤炭异物目标检测算法主要是面向已知对象的检测算法,对未知目标,尤其是各类锚杆、新式支护材料等具有复杂外观与语义不确定目标的检测能力不足,亟需研究能够同时具备已知与未知异物检测能力的目标检测模型。提出了一种基于多模态融合的煤炭异物开集检测方法。首先,基于DINO网络,设计了文本与图像的双模态特征信息提取架构,以获取更具类别判别性的文本与视觉特征,引入路径聚合特征金字塔网络,采用多层特征抽取策略,将深层语义特征与浅层空间细节有效结合,强化对小尺度煤炭异物的感知能力,提升检测精度;其次,构建了基于自注意力机制与交叉注意力机制的多模态特征融合模块,实现文本与视觉特征的深度交互与高效融合,并引入基于语言引导的查询选择机制,使任意类别文本描述与视觉查询建立对应关系,从而提升特征语义一致性与跨类别泛化能力;最后,设计了一种基于视觉−文本多模态解码模块,在每层查询更新阶段插入文本引导机制,使可学习查询在与图像特征交互前对齐语言特征,有效提升多模态特征对齐的准确性与鲁棒性。基于自建煤炭异物数据集构建多类别组合的开放动态环境,并系统开展了试验,结果表明本文方法在已知类别检测不同开放度任务中mAP@0.5精度均优于其他对比方法,在未知类别检测不同开放度任务中,未知类召回率分别达到41.24%、52.26%、57.13%,验证了零样本条件下的有效性。本文方法具备针对未知类别煤炭异物的检测能力,为煤炭异物的开集检测提供了有效的技术支撑。

     

    Abstract: In the raw coal sorting process, large gangue blocks, steel wires, woven bags, and other foreign objects are first identified and removed to avoid adverse impacts on subsequent processing stages or the occurrence of safety accidents. Current coal foreign-object detection algorithms are primarily designed for closed-set scenarios and focus on detecting known object categories, exhibiting limited capability in detecting and recognizing unknown-category targets—particularly anchor rods, novel support materials, and other objects with complex appearances and semantic uncertainty. Consequently, there is an urgent need to investigate object detection models that can simultaneously handle both known and unknown foreign objects. Firstly, a text-image bimodal feature extraction architecture is constructed based on the DINO network, and the Path Aggregation Feature Pyramid Network (PAFPN) is introduced. A multi-layer feature extraction strategy is adopted to enhance the perception of small-scale foreign objects. Secondly, a multimodal feature fusion module based on the self-attention mechanism and cross-attention mechanism is built, and a language-guided query selection mechanism is incorporated to achieve deep interaction between text and visual features, thereby improving the semantic consistency of features and cross-category generalization ability. Finally, a vision–text multimodal decoding module is designed, which inserts a text guidance mechanism at the query update stage of each layer to improve the accuracy and robustness of multimodal feature alignment. An open and dynamic environment with multi-category combinations was constructed based on a self-built coal foreign object dataset, and systematic experiments are conducted. The results demonstrate that the proposed method outperforms all baselines in mAP@0.5 on known-category detection across openness levels, and attains unknown-category recall rates of 41.24%, 52.26%, and 57.13% on unknown-category detection, confirming its zero-shot effectiveness. The method proposed in this paper demonstrates effectiveness in detecting foreign objects in unknown types of coal, providing effective technical support for the open-set detection of foreign objects in coal.

     

/

返回文章
返回