[1]谢文涌,柴琴琴,林旎,等.基于Stacking集成学习的马兜铃酸及其类似物鉴别[J].江苏农业学报,2021,(02):503-508.[doi:doi:10.3969/j.issn.1000-4440.2021.02.028]
 XIE Wen-yong,CHAI Qin-qin,LIN Ni,et al.Discrimination of aristolochic acid and its analogues based on stacking ensemble learning[J].,2021,(02):503-508.[doi:doi:10.3969/j.issn.1000-4440.2021.02.028]
点击复制

基于Stacking集成学习的马兜铃酸及其类似物鉴别()
分享到:

江苏农业学报[ISSN:1006-6977/CN:61-1281/TN]

卷:
期数:
2021年02期
页码:
503-508
栏目:
加工贮藏·质量安全
出版日期:
2021-04-30

文章信息/Info

Title:
Discrimination of aristolochic acid and its analogues based on stacking ensemble learning
作者:
谢文涌12柴琴琴12林旎3李祥辉3王武12
(1.福州大学电气工程与自动化学院,福建福州350108;2.福建省医疗器械和医药技术重点实验室,福建福州350108;3.福建医科大学医学技术与工程学院,福建福州350004)
Author(s):
XIE Wen-yong12CHAI Qin-qin12LIN Ni3LI Xiang-hui3WANG Wu12
(1.College of Electrical Engineering and Automation, Fuzhou University, Fuzhou 350108, China;2.Fujian Key Laboratory of Medical Instrument and Pharmaceutical Technology, Fuzhou 350108, China;3.School of Medical Technology and Engineering, Fujian Medical University, Fuzhou 350004, China)
关键词:
马兜铃酸近红外光谱主成分分析Stacking集成学习
Keywords:
aristolochic acidnear infrared spectroscopyprincipal component analysisStacking ensemble-learning
分类号:
TP391
DOI:
doi:10.3969/j.issn.1000-4440.2021.02.028
文献标志码:
A
摘要:
以中草药中所含成分马兜铃酸及其类似物为研究对象,针对传统中药鉴定存在的主观性强、操作复杂等不足以及单一机器学习模型鉴别精度不高的问题,提出多模型融合的Stacking集成学习分类模型,用来实现马兜铃酸及其类似物的鉴别。采集马兜铃酸、1,10-菲咯啉-4,7-二甲酸、菲醌、β-谷甾醇4种样品的近红外光谱数据,对其进行数据预处理与主成分分析降维,基于降维后的数据特征,通过遍历搜索策略构建了以随机森林、支持向量机、朴素贝叶斯为基分类器,随机森林为元分类器的Stacking集成学习分类模型。结果表明,Stacking集成学习分类模型具有最佳表现性能,鉴别正确率最高达到99.38%,比K最近邻、决策树、随机森林、支持向量机、朴素贝叶斯分类模型的平均鉴别正确率高8.23个百分点,并且在精确率、召回率、综合评价指标(F1值)方面有优异表现。综上可见,本研究提出的Stacking集成学习分类模型能够快速有效地鉴别马兜铃酸及其类似物。
Abstract:
Aristolochic acid and its analogues contained in Chinese herbal medicine were taken as the research objects. Classification model based on Stacking ensemble-learning with multi-model fusion was proposed to identify aristolochic acid and its analogues, aiming at the shortcomings in traditional Chinese medicine identification such as strong subjectivity, complex operations and low accuracy of single classifier model. The near-infrared spectroscopy data of aristolochic acid, 1,10-phenanthroline-4,7-dicarboxylic acid, phenanthraquinone and β-sitosterol samples were collected. The data were preprocessed and principal component analysis was used to reduce dimensionality. Stacking ensemble-learning model was constructed through traversal search strategies based on the data features after dimensionality reduction, with random forest (RF), support vector machine (SVM), naive bayes (NB) as base classifiers and RF as meta classifier. The results showed that classification model based on Stacking ensemble-learning showed the best performance, with a discrimination accuracy rate of 99.38%, which was 8.23 percentage point higher than the average discrimination accuracy rate of classification models like K nearest, decision tree, RF, SVM and NB. Moreover, the proposed method showed excellent performance in precision, recall ratio and comprehensive evaluation index (F1 score). Therefore, the method proposed in this study can quickly and effectively identify aristolochic acid and its analogues.

参考文献/References:

[1]HOLZBACH J C, NASCIMENTO I R, LOPES L M X. Phenylethylpyranone and aristolochic acid derivatives from Aristolochia urupaensis[J]. Journal of the Brazilian Chemical Society, 2017, 28(11): 2275-2279.
[2]JIN K, SU K K, LI T, et al. Hepatic premalignant alterations triggered by human nephrotoxin aristolochic acid Ⅰ in canines[J]. Cancer Prevention Research, 2016, 9(4): 324-334.
[3]薛寿征,曾广先. 马兜铃酸肾病:研究及启示[J]. 科学(上海), 2018, 70(4): 27-31.
[4]柏兆方,王春宇,王伽伯,等. 马兜铃酸与肝癌相关性的研究及思考[J]. 世界科学技术:中医药现代化, 2019,21(7): 1275-1279.
[5]宋亚刚,苗艳艳,苗明三. 含马兜铃酸中药毒性分析[J]. 中华中医药杂志, 2018, 33(5): 1950-1954.
[6]章莹,肖榕,黄杰,等. 不同产地马兜铃蜜炙前后HPLC指纹图谱分析[J]. 中国药学杂志, 2017, 52(16): 1397-1402.
[7]刘欣欣,王莉,肖红斌. 不同产地马兜铃药材中马兜铃总酸的含量[J]. 时珍国医国药, 2017,28(1):74-76.
[8]LIN W Q, CHAI Q Q, WANG W, et al. A novel method for geographical origin identification of Tetrastigma hemsleyanum (Sanyeqing) by near-infrared spectroscopy[J]. Analytical Methods, 2018, 10(25): 2980-2988.
[9]MORAIS C L M, LIMA K M G. Principal component analysis with linear and quadratic discriminant analysis for identification of cancer samples based on mass spectrometry[J]. Journal of the Brazilian Chemical Society, 2018, 29: 472-481.
[10]LI C N, SHAO Y H, YIN W T, et al. Robust and sparse linear discriminant analysis via an alternating direction method of multipliers[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 31(3): 915-926.
[11]CHEN Y W, HU X L, FAN W T, et al. Fast density peak clustering for large scale data based on kNN[J]. Knowledge-Based Systems, 2020, 2020(187): 104824.
[12]马娜,李艳文,徐苗. 基于改进SVM 算法的植物叶片分类研究[J]. 山西农业大学学报(自然科学版), 2018, 38(11): 33-38.
[13]张晓忆,李卫国,景元书,等. 多种光谱指标构建决策树的水稻种植面积提取[J]. 江苏农业学报, 2016, 32(5): 1066-1072.
[14]唐云峰,柴琴琴,林双杰,等. 可见/近红外光谱的葡萄籽油掺伪检测系统[J]. 光谱学与光谱分析, 2020, 40(1): 202-208.
[15]陈曦,张坤. 一种基于树增强朴素贝叶斯的分类器学习方法[J]. 电子与信息学报, 2019, 41(8): 2001-2008.
[16]袁培森,杨承林,宋玉红,等. 基于Stacking集成学习的水稻表型组学实体分类研究[J]. 农业机械学报, 2019, 50(11):144-152.
[17]ANDIOJAYA A, DEMIRHAN H. A bagging algorithm for the imputation of missing values in time series[J]. Expert Systems with Application, 2019, 129(9): 10-26.
[18]WANG B Y, PINEAU J. Online bagging and boosting for imbalanced data streams[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(12): 3353-3366.
[19]ELAYIDOM S, IDIKKULA S M, ALEXANDER J. A hybrid stacking ensemble framwork for employment predicyion problems[J]. Advances in Computational Research, 2011, 3(1): 25-30.
[20]DINAKAR K, WEINSTEIN E, LIEBERMAN H, et al. Stacked generalization learning to analyze teenage distress[C]//Association for the Advancement of Artificial Intelligence. Eighth International AAAI Conference on Weblogs and Social Media. Ann Arbor, Michigan, USA:Association for the Advancement of Artificial Intelligence,2014.
[21]HADDAD B M, YANG S, KARAM L J, et al. Multifeature, sparse-based approach for defects detection and classifification in semiconductor units[J]. IEEE Transactions on Automation Science and Engineering, 2016, 15(1): 145-159.
[22]孙博,王建东,陈海燕,等. 集成学习中的多样性度量[J]. 控制与决策, 2014, 29(3): 385-394.
[23]章宁,陈钦. 基于AUC及Q统计值的集成学习训练方法[J]. 计算机应用, 2019, 39(4):935-939.
[24]GUI L, XIA Y, LI H, et al. Prediction of NOX emission from coal-fired boiler based on RF-GBDT[C]//KIM YH. 2017 6th International Conference on Energy and Environmental Protection. Zhuhai, China:KIM YH, 2017.

相似文献/References:

[1]张平平,张瑜,唐果,等.近红外光谱技术检测小麦谷蛋白大聚体含量[J].江苏农业学报,2017,(06):1207.[doi:doi:10.3969/j.issn.1000-4440.2017.06.002]
 ZHANG Ping-ping,ZHANG Yu,TANG Guo,et al.Measurement of SDS-unextractable polymeric protein content in wheat flour based on near-infrared spectroscopy (NIRS) technique[J].,2017,(02):1207.[doi:doi:10.3969/j.issn.1000-4440.2017.06.002]
[2]仇逊超.红松仁脂肪的近红外光谱定量检测[J].江苏农业学报,2018,(03):692.[doi:doi:10.3969/j.issn.1000-4440.2018.03.031]
 QIU Xun-chao.Quantitative detection of fat in peeled Korean pine seeds using near infrared spectroscopy[J].,2018,(02):692.[doi:doi:10.3969/j.issn.1000-4440.2018.03.031]
[3]彭雅玲,邱雪,张海红,等.近红外光谱技术检测灵武长枣果肉硬度和贮藏时间[J].江苏农业学报,2019,(01):182.[doi:doi:10.3969/j.issn.1000-4440.2019.01.026]
 PENG Ya-ling,QIU Xue,ZHANG Hai-hong,et al.Near-infrared spectroscopy for the determination of hardness and storage time of jujube fruit[J].,2019,(02):182.[doi:doi:10.3969/j.issn.1000-4440.2019.01.026]
[4]张津源,张德贤,张苗.基于连续投影算法的小麦蛋白质含量近红外光谱预测分析[J].江苏农业学报,2019,(04):960.[doi:doi:10.3969/j.issn.1000-4440.2019.04.030]
 ZHANG Jin yuan,ZHANG De xian,ZHANG Miao.Prediction and analysis of wheat protein content by nearinfrared spectroscopy based on successive projections algorithm[J].,2019,(02):960.[doi:doi:10.3969/j.issn.1000-4440.2019.04.030]
[5]曲歌,陈争光,张庆华.基于无信息变量消除法的水稻种子发芽率测定[J].江苏农业学报,2019,(05):1015.[doi:doi:10.3969/j.issn.1000-4440.2019.05.002]
 QU Ge,CHEN Zheng-guang,ZHANG Qing-hua.Study on germination rate of rice seed based on uninformation variable elimination method[J].,2019,(02):1015.[doi:doi:10.3969/j.issn.1000-4440.2019.05.002]
[6]孙晓明,陈小龙,余向阳,等.基于近红外光谱分析技术的水蜜桃产地溯源[J].江苏农业学报,2020,(02):507.[doi:doi:10.3969/j.issn.1000-4440.2020.02.035]
 SUN Xiao-ming,CHEN Xiao-long,YU Xiang-yang,et al.Traceability of honey peach origin using near infrared spectroscopy analysis techniques[J].,2020,(02):507.[doi:doi:10.3969/j.issn.1000-4440.2020.02.035]
[7]方瑶,谢天铧,郭渭,等.基于近红外光谱的金鲳鱼新鲜度快速检测技术[J].江苏农业学报,2021,(01):213.[doi:doi:10.3969/j.issn.1000-4440.2021.01.028]
 FANG Yao,XIE Tian-hua,GUO Wei,et al.Rapid detection technology of pomfret freshness based on near infrared spectroscopy[J].,2021,(02):213.[doi:doi:10.3969/j.issn.1000-4440.2021.01.028]
[8]沈广辉,曹瑶瑶,刘馨,等.近红外高光谱成像结合特征波长筛选识别小麦赤霉病瘪粒[J].江苏农业学报,2021,(02):509.[doi:doi:10.3969/j.issn.1000-4440.2021.02.029]
 SHEN Guang-hui,CAO Yao-yao,LIU Xin,et al.Identification of Fusarium damaged kernels using near infrared hyperspectral imaging and characteristic bands selection[J].,2021,(02):509.[doi:doi:10.3969/j.issn.1000-4440.2021.02.029]
[9]仇逊超,张春越,张怡卓,等.流形学习在红松籽仁蛋白质含量近红外检测中的应用[J].江苏农业学报,2023,(01):246.[doi:doi:10.3969/j.issn.1000-4440.2023.01.028]
 QIU Xun-chao,ZHANG Chun-yue,ZHANG Yi-zhuo,et al.Application of manifold learning in quantitative detection of protein in Korean pine seed kernels using near-infrared quantitative detection[J].,2023,(02):246.[doi:doi:10.3969/j.issn.1000-4440.2023.01.028]

备注/Memo

备注/Memo:
收稿日期:2020-08-04基金项目:国家自然科学基金项目(61773124);晋江市福大科教园区发展中心科研项目(2019-JJFDKY-48)作者简介:谢文涌(1994-),男,福建三明人,硕士研究生,主要从事机器学习研究。(E-mail)1024396820@qq.com通讯作者:柴琴琴,(E-mail)qq.chai@fzu.edu.cn
更新日期/Last Update: 2021-05-10