Abstract:
In open-pit mining operations, safe mining is the key to ensuring the smooth progress of production processes. However, obstacle detection in open-pit mines is confronted with severe challenges brought by complex environments. Traditional visual algorithms are difficult to meet the requirements of complex production scenarios in open-pit mines. The reasons include: complex road environments, difficulty in distinguishing road conditions due to low light, and frequent harsh weather—all of which seriously interfere with detection accuracy; in the cluttered mine background, various mechanical equipment, piles of materials, and undulating terrain are intertwined, and there are multiple types of obstacles. These factors make the limitations of traditional algorithms prominent, and the adaptability and robustness of existing improved algorithms in open-pit mine environments still need to be enhanced.To achieve accurate and efficient detection of obstacles in open-pit mines, ensure the safety of mine transportation, and break through the bottlenecks of traditional algorithms, a cross-modal detection model—Mamba-YOLO-World—is proposed based on the YOLO architecture. This model adopts multimodal information and combines text information with image information to improve detection accuracy and obstacle detection performance under complex conditions.It introduces the MambaFusion-PAN architecture for in-depth analysis, optimizes the fusion of multimodal data, and accurately extracts key features of road obstacles. By means of a state-space model, it captures the intrinsic correlations between various modalities to achieve comprehensive detection of the complex production environment in open-pit mines.In addition, it integrates PENet: through the Laplacian pyramid, the input image is decomposed into sub-images with different resolutions to capture the global context, and edge branches are used to enhance textures—greatly improving the detection capability under dim light conditions. Meanwhile, a text-image combined loss function is introduced to promote the improvement of model training accuracy.After screening and organizing 6 000 images, corresponding text information was created at the same time; the combination of the two forms the foundation of the dataset. Technologies such as image enhancement, noise reduction, cropping, and scaling are applied to provide high-quality data support for the experiment.In the experiment, the Mamba-YOLO-World model integrated with PENet was tested in comparison with mainstream algorithms such as YOLOv8x and YOLOv9e. The results show that this model has prominent advantages in core indicators including mPA50%, accuracy, and recall rate, among which the mPA50% reaches 64.8%.In-depth analysis reveals that PENet significantly improves the low-light detection effect, and the combined loss function empowers the training of multimodal data. Finally, the conclusion is drawn: although the Mamba-YOLO-World model integrated with PENet has slightly higher computational complexity, it has obvious accuracy advantages. Compared with traditional algorithms, it enhances the model’s adaptability and robustness through multimodal data fusion, providing a cross-modal obstacle detection method for open-pit mining.