湖北农业科学 ›› 2022, Vol. 61 ›› Issue (18): 196-202.doi: 10.14088/j.cnki.issn0439-8114.2022.18.035

• 信息工程 • 上一篇    下一篇

基于BERT嵌入和对抗训练的农业领域命名实体识别

费凡, 杨林楠   

  1. 云南农业大学大数据学院,昆明 650201
  • 收稿日期:2021-07-14 出版日期:2022-09-25 发布日期:2022-10-21
  • 通讯作者: 杨林楠(1964-),男,云南保山人,教授,博士,主要从事农业信息化研究,(电话)13888263241(电子信箱)lny5400@163.com。
  • 作者简介:费 凡(1994-),男,安徽安庆人,在读硕士研究生,研究方向为自然语言处理,(电话)17555121687(电子信箱)1962288205@qq.com。
  • 基金资助:
    云南省重大科技项目专项计划项目(202002AD080002)

Named entity recognition in agriculture based on BERT embedding and adversarial training

FEI Fan, YANG Lin-nan   

  1. College of Big Data, Yunnan Agricultural University,Kunming 650201, China
  • Received:2021-07-14 Online:2022-09-25 Published:2022-10-21

摘要: 面向农业领域的命名实体识别任务是农业领域中信息提取及问答系统的关键步骤,目标是从海量非结构化农业文本中找出需要的命名实体,通常存在诸如实体名称多样和上下文语义缺失等挑战。为完成复杂语境下的农业命名实体识别任务,首先构建一个农业领域的标注语料库,其中包含6类实体共16 048个样本;接着将BERT预训练语言模型作为词嵌入层,相比较传统词嵌入模型,很好地解决了不同语境下同一词语的不同语义以及指代问题;然后利用BiGRU网络模型进行上下文编码;最后利用CRF优化输出标注。同时,对输入数据引入一定的噪声,以此进行对抗训练来提高模型的泛化性和鲁棒性。经过试验,提出模型的准确率、召回率、F值分别为92.75%、91.53%、92.49%,与基线模型相比,该方法的性能表现更好,可以有效地识别出农业领域命名实体。

关键词: 农业, 自然语言处理, 命名实体识别, 信息抽取, BERT, BiGRU, 对抗训练

Abstract: Named entity recognition task for the agricultural domain is a key step in information extraction and question and answer system in the agricultural domain. The goal of this step is to find out the required named entities from a huge amount of unstructured agricultural texts, which usually have challenges such as diverse entity names and contextual semantics missing. To accomplish the task of recognizing agricultural named entities in complex contexts, this paper first constructs an annotated corpus in the agricultural domain, which contains 16 048 samples of six types of entities; then uses the BERT pre-trained language model as the word embedding layer, which can well solve the problem of different semantics and referents of the same word in different contexts compared with the traditional word embedding model; then uses the BiGRU network model for context encoding; finally, the output sequence is annotated using CRF. At the same time, this paper introduces a certain amount of noise to the input data, which is used for adversarial training to improve the generalization and robustness of the model. After the experiments, the accuracy, recall, and F-value of the proposed model are 92.75%, 91.53%, and 92.49%, respectively. Compared with the baseline model, this method has better performance and can effectively identify named entities in the agricultural field.

Key words: agriculture, natural language processing, named entity recognition, information extraction, BERT, BiGRU, adversarial training

中图分类号: