[1]彭莹琼,饶宇翔,廖牧鑫,等.交叉注意力机制引导的无监督域自适应图像分类模型构建及其在细粒度实蝇识别中的应用[J].江苏农业学报,2026,42(04):756-762.[doi:doi:10.3969/j.issn.1000-4440.2026.04.012]
 PENG Yingqiong,RAO Yuxiang,LIAO Muxin,et al.Construction of an unsupervised domain adaptive image classification model guided by cross-attention mechanism and its application in fine-grained fruit fly recognition[J].,2026,42(04):756-762.[doi:doi:10.3969/j.issn.1000-4440.2026.04.012]
点击复制

交叉注意力机制引导的无监督域自适应图像分类模型构建及其在细粒度实蝇识别中的应用()

江苏农业学报[ISSN:1006-6977/CN:61-1281/TN]

卷:
42
期数:
2026年04期
页码:
756-762
栏目:
农业信息工程
出版日期:
2026-04-30

文章信息/Info

Title:
Construction of an unsupervised domain adaptive image classification model guided by cross-attention mechanism and its application in fine-grained fruit fly recognition
作者:
彭莹琼12饶宇翔12廖牧鑫3钟文博1
(1.江西农业大学软件学院,江西南昌330045;2.江西省高等学校农业信息技术重点实验室,江西南昌330000;3.江西农业大学计算机与信息工程学院,江西南昌330045)
Author(s):
PENG Yingqiong12RAO Yuxiang12LIAO Muxin3ZHONG Wenbo1
(1.School of Software, Jiangxi Agricultural University, Nanchang 330045, China;2.Key Laboratory of Agricultural Information Technology of Colleges and Universities in Jiangxi Province, Nanchang 330000, China;3.School of Computer Science and Engineering, Jiangxi Agricultural University, Nanchang 330045, China)
关键词:
实蝇无监督域自适应注意力机制图像分类
Keywords:
Bactroceraunsupervised domain adaptationattention mechanismimage classification
分类号:
Q969.456.8
DOI:
doi:10.3969/j.issn.1000-4440.2026.04.012
文献标志码:
A
摘要:
现有害虫图像分类模型在跨域识别时常出现性能下降。为此,本研究提出一种基于焦点区域的交叉注意力机制引导的无监督域自适应图像分类模型FC-DroNet。该模型首先在特征提取中引入掩膜处理,并利用级联交叉注意力模块融合水平、垂直及全局空间特征,以增强对细粒度局部特征的捕捉能力。同时,通过引入一致性约束机制抑制源域过拟合,从而提升跨域泛化性能。构建包含瓜实蝇(Bactrocera cucurbitae)、具条实蝇(Bactrocera scutellata)、南瓜实蝇(Bactrocera tau)和橘小实蝇[Bactrocera dorsalis (Hendel)]4类图像的数据集FD4Set,对FC-DroNet模型性能进行验证。结果表明,FC-DroNet模型在测试集上的精确率达99.41%,F1达99.22%,均高于ResNet50模型、AlexNet模型、VGG-16模型、LeNet-5模型、ConvNext模型、MobileVit模型。本研究结果为田间害虫智能识别提供了技术支持。
Abstract:
Existing pest image classification models often suffer from performance degradation in cross-domain recognition. To address this issue, this study proposed an unsupervised domain adaptive image classification model based on focal regions, named FC-DroNet. First, the model introduced mask processing in feature extraction and utilized a cascaded cross-attention module to fuse horizontal, vertical, and global spatial features, thereby enhancing the ability to capture fine-grained local features. Meanwhile, a consistency constraint mechanism was introduced to suppress overfitting on the source domain, thus improving cross-domain generalization performance. A dataset, FD4Set, containing four types of images, namely Bactrocera cucurbitae, Bactrocera scutellata, Bactrocera tau, and Bactrocera dorsalis (Hendel), was constructed to verify the performance of the FC-DroNet model. The results showed that the precision of the FC-DroNet model on the test set reached 99.41% and the F1 reached 99.22%, both higher than those of the ResNet50 model, AlexNet model, VGG-16 model, LeNet-5 model, ConvNext model, and MobileVit model. The findings of this study provide technical support for the intelligent identification of field pests.

参考文献/References:

[1]胡婷,孙晓海,宋海龙,等. 基于层次标注和自适应预处理的多源农业病害图像数据集构建[J]. 吉林大学学报(理学版),2025,63(3):815-821.
[2]魏超宇,韩文,庞程,等. 基于多尺度特征融合和密集连接网络的疏果期黄花梨植株图像分割[J]. 江苏农业学报,2021,37(4):990-997.
[3]孙进,张洋,王宁,等. 融合机器视觉和CAN总线的玉米种粒分类器设计与试验[J]. 中国农机化学报,2020,41(8):81-89,120.
[4]郝月华,吕卫东,张幽迪,等. 基于多分类自适应聚焦损失与B-CNN的棉田昆虫细粒度图像分类研究[J]. 现代电子技术,2025,48(5):43-48.
[5]ZHOU M, DUAN N, LIU S J, et al. Progress in neural NLP: modeling,learning,and reasoning[J]. Engineering,2020,6(3):275-290.
[6]蒋东山,刘金洋,张浩淼,等. 基于CNN和Transformer的绿豆干旱胁迫识别模型[J]. 江苏农业学报,2025,41(1):87-100.
[7]唐秀英,孙中清,杨琳琳,等. 基于改进YOLO v8n轻量化的番茄叶霉病发病程度分级检测[J]. 江苏农业学报,2025,41(10):1985-1996.
[8]CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]//ECVA. European Conference on Computer Vision. Cham:Springer International Publishing,2020.
[9]温世雄,智敏. 视觉Transformer在细粒度图像分类中的应用综述[J]. 计算机工程与应用,2025,61(23):24-37.
[10]ZHAO H H, SHI J P, QI X J, et al. Pyramid scene parsing network[C]//IEEE. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway,NJ:IEEE Computer Society,2017.
[11]ZHANG H, DANA K, SHI J P, et al. Context encoding for semantic segmentation[C]//IEEE. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway,NJ:IEEE Computer Society,2018.
[12]李强,陈蓓,张芳. 基于多尺度扩张卷积神经网络的近红外光谱定量分析模型研究[J]. 分析化学,2025,53(3):451-463.
[13]CHENG J P, DONG L, LAPATA M. Long shortterm memory-networks for machine reading[EB/OL]. (2016-09-20)
[2025-10-15]. https://arxiv.org/pdf/1601.06733.
[14]HUANG Z, WANG X, HUANG L, et al. Ccnet:criss-cross attention for semantic segmentation[C]//IEEE. Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway,NJ:IEEE Computer Society,2019.
[15]YING X, ZHANG Y L, WEI X, et al. MSDAN:multi-scale self-attention unsupervised domain adaptation network for thyroid ultrasound images[C]//IEEE. 2020 IEEE International Conference on Bioinformatics and Biomedicine. Piscataway,NJ:IEEE,2020.
[16]ZHANG Z L, ZHANG X Y, PENG C, et al. Exfuse:enhancing feature fusion for semantic segmentation[C]//Proceedings of the European Conference on Computer Vision (ECCV). Cham,Switzerland:Springer,2018.
[17]TIBSHIRANI R. Regression shrinkage and selection via the lasso[J]. Journal of the Royal Statistical Society Series B:Statistical Methodology,1996,58(1):267-288.
[18]CHOWDHURY G G. Introduction to modern information retrieval[M]. London:Facet Publishing,2010.
[19]LIN J. Divergence measures based on the Shannon entropy[J]. IEEE Transactions on Information Theory,1991,37(1):145-151.
[20]HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//IEEE. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway,NJ:IEEE Computer Society,2016.
[21]KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM,2017,60(6):84-90.
[22]SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL].(2014-09-04)
[2025-10-08]. https://doi.org/10.48550/arXiv.1409.1556LE.
[23]CUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE,2002,86(11):2278-2324.
[24]LIU Z, MAO H, WU C Y, et al. A convnet for the 2020s[C]//IEEE. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway,NJ:IEEE Computer Society,2022.
[25]MEHTA S, RASTEGARI M. Mobilevit:light-weight,general-purpose,and mobile-friendly vision transformer[J]. (2021-10-05)
[2025-10-15]. https://doi.org/10.48550/arXiv.2110.02178.
[26]TAN M X, LE Q V. Efficientnetv2:smaller models and faster training[EB/OL].(2021-04-01)
[2025-10-15]. https://doi.org/10.48550/arXiv.2104.00298.
[27]SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-cam:visual explanations from deep networks via gradient-based localization[C]//IEEE. Proceedings of the IEEE International Conference on Computer Vision. Piscataway,NJ:IEEE Computer Society,2017.

备注/Memo

备注/Memo:
收稿日期:2025-10-15基金项目:国家自然科学基金项目(62262028);江西省自然科学基金面上项目(20242BAB25082)作者简介:彭莹琼(1979-),女,江西萍乡人,硕士,教授,硕士生导师,主要从事农业信息化、图像处理研究。(E-mail)jneyq@jxau.edu.cn通讯作者:钟文博,(E-mail)jneyq_pyq@jxau.edu.cn
更新日期/Last Update: 2026-05-11