Position: Home > Articles > Agricultural Named Entity Recognition Based on Semantic Aggregation and Model Distillation
基于语义融合与模型蒸馏的农业实体识别
作 者:
李亮德;王秀娟;康孟珍;华净;樊梦涵
单 位:
中国科学院大学人工智能学院;北京智能化技术与系统工程技术研究中心;青岛中科慧农科技有限公司;中国科学院自动化研究所复杂系统管理与控制国家重点实验室
关键词:
远程监督;农业知识图谱;农业问答系统;实体识别;知识蒸馏;深度学习;BERT;双向长短期记忆网络
摘 要:
当前农业实体识别标注数据稀缺,部分公开的农业实体识别模型依赖手工特征,实体识别精度低。虽然有的农业实体识别模型基于深度学习方法,实体识别效果有所提高,但是存在模型推理延迟高、参数量大等问题。本研究提出了一种基于知识蒸馏的农业实体识别方法。首先,利用互联网的海量农业数据构建农业知识图谱,在此基础上通过远程监督得到弱标注语料。其次,针对实体识别的特点,提出基于注意力的BERT层融合模型(BERT-ALA),融合不同层次的语义特征;结合双向长短期记忆网络(BiLSTM)和条件随机场CRF,得到BERT-ALA+BiLSTM+CRF模型作为教师模型。最后,用BiLSTM+CRF模型作为学生模型蒸馏教师模型,保证模型预测耗时和参数量符合线上服务要求。在本研究构建的农业实体识别数据集以及两个公开数据集上进行实验,结果显示,BERT-ALA+BiLSTM+CRF模型的macro-F1相对于基线模型BERT+BiLSTM+CRF平均提高1%。蒸馏得到的学生模型BiLSTM+CRF的macro-F1相对于原始数据训练的模型平均提高3.3%,预测耗时降低了33%,存储空间降低98%。试验结果验证了基于注意力机制的BERT层融合模型以及知识蒸馏在农业实体识别方面具有有效性。
译 名:
Agricultural Named Entity Recognition Based on Semantic Aggregation and Model Distillation
作 者:
LI Liangde;WANG Xiujuan;KANG Mengzhen;HUA Jing;FAN Menghan;The State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences;School of Artificial Intelligence, University of Chinese Academy of Sciences;Beijing Engineering Research Center of Intelligent Systems and Technology;Qingdao Smart AgriTech.,Ltd;
关键词:
distant supervision;;agriculture knowledge graph;;agriculture Q&A system;;named entity recognition;;knowledge distillation;;deep learning;;BERT;;Bi-LSTM
摘 要:
With the development of smart agriculture, automatic question and answer(Q&A) of agricultural knowledge is needed to improve the efficiency of agricultural information acquisition. Agriculture named entity recognition plays a key role in automatic Q&A system, which helps obtaining information, understanding agriculture questions and providing answer from the knowledge graph. Due to the scarcity of labeled ANE data, some existing open agricultural entity recognition models rely on manual features, can reduce the accuracy of entity recognition. In this work, an approach of model distillation was proposed to recognize agricultural named entity data. Firstly, massive agriculture data were leveraged from Internet, an agriculture knowledge graph(AgriKG) was constructed. To overcome the scarcity of labeled named agricultural entity data, weakly named entity recognition label on agricultural texts crawled from the Internet was built with the help of AgriKG. The approach was derived from distant supervision, which was used to solve the scarcity of labeled relation extraction data. Considering the lack of labeled data, pretraining language model was introduced, which is fine tuned with existing labeled data. Secondly, large scale pretraining language model, BERT was used for agriculture named entity recognition and provided a pretty well initial parameters containing a lot of basic language knowledge. Considering that the task of agriculture named entity recognition relied heavily on low-end semantic features but slightly on high-end semantic features, an Attention-based Layer Aggregation mechanism for BERT(BERT-ALA) was designed in this research. The aim of BERT-ALA was to adaptively aggregate the output of multiple hidden layers of BERT. Based on BERT-ALA model, Bidirectional LSTM(BiLSTM) and conditional random field(CRF) were coupled to further improve the recognition precision, giving a BERT-ALA+BiLSTM+CRF model. Bi-LSTM improved BERT's insufficient learning ability of the relative position feature, while conditional random field models the dependencies of entity recognition label. Thirdly, since BERT-ALA+BiLSTM+CRF model was difficult to serve online because of the extremely high time and space complexity, BiLSTM+CRF model was used as student model to distill BERT-ALA+BiLSTM+CRF model. It fitted the BERT-ALA+BiLSTM+CRF model's output of BiLSTM layer and CRF layer. The experiment on the database constructed in the research, as well as two open datasets showed that(1) the macro-F1 of the BERT-ALA + BiLSTM + CRF model was improved by 1% compared to the baseline model BERT + BiLSTM + CRF, and(2) compared with the model trained on the original data, the macro-F1 of the distilled student model BiLSTM + CRF was increased by an average of 3.3%, the prediction time was reduced by 33%, and the storage space was reduced by 98%. The experimental results verify the effectiveness of the BERTALA and knowledge distillation in agricultural entity recognition.
相似文章
-
基于多层注意力机制的农业病虫害远程监督关系抽取研究 [乐毅, 王文宇, 张凯, 梁振京, 刘飞, 陈祎琼, 吴云志, 张友华] 安徽农业大学学报 2020 (4) 682-686