Advance Search
CAO Xiangang,LIU Hang,LIU Jiahui,et al. Multimodal fusion method for foreign object open-set detection in raw coal sorting scenarioJ. Coal Science and Technology,2026,54(1):1−11. DOI: 10.12438/cst.2025-1228
Citation: CAO Xiangang,LIU Hang,LIU Jiahui,et al. Multimodal fusion method for foreign object open-set detection in raw coal sorting scenarioJ. Coal Science and Technology,2026,54(1):1−11. DOI: 10.12438/cst.2025-1228

Multimodal fusion method for foreign object open-set detection in raw coal sorting scenario

  • In the raw coal sorting process, large gangue blocks, steel wires, woven bags, and other foreign objects are first identified and removed to avoid adverse impacts on subsequent processing stages or the occurrence of safety accidents. Current coal foreign-object detection algorithms are primarily designed for closed-set scenarios and focus on detecting known object categories, exhibiting limited capability in detecting and recognizing unknown-category targets—particularly anchor rods, novel support materials, and other objects with complex appearances and semantic uncertainty. Consequently, there is an urgent need to investigate object detection models that can simultaneously handle both known and unknown foreign objects. Firstly, a text-image bimodal feature extraction architecture is constructed based on the DINO network, and the Path Aggregation Feature Pyramid Network (PAFPN) is introduced. A multi-layer feature extraction strategy is adopted to enhance the perception of small-scale foreign objects. Secondly, a multimodal feature fusion module based on the self-attention mechanism and cross-attention mechanism is built, and a language-guided query selection mechanism is incorporated to achieve deep interaction between text and visual features, thereby improving the semantic consistency of features and cross-category generalization ability. Finally, a vision–text multimodal decoding module is designed, which inserts a text guidance mechanism at the query update stage of each layer to improve the accuracy and robustness of multimodal feature alignment. An open and dynamic environment with multi-category combinations was constructed based on a self-built coal foreign object dataset, and systematic experiments are conducted. The results demonstrate that the proposed method outperforms all baselines in mAP@0.5 on known-category detection across openness levels, and attains unknown-category recall rates of 41.24%, 52.26%, and 57.13% on unknown-category detection, confirming its zero-shot effectiveness. The method proposed in this paper demonstrates effectiveness in detecting foreign objects in unknown types of coal, providing effective technical support for the open-set detection of foreign objects in coal.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return