湖北农业科学 ›› 2022, Vol. 61 ›› Issue (13): 151-155.doi: 10.14088/j.cnki.issn0439-8114.2022.13.028

• 信息工程 • 上一篇    下一篇

基于少样本学习的农业领域短文本分类研究

麻之润1,2, 费凡1, 黎芬1, 董慧洁1, 彭琳1   

  1. 1.云南农业大学大数据学院,昆明 650000;
    2.绿色农产品大数据智能信息处理工程研究中心,昆明 650000
  • 收稿日期:2021-06-03 出版日期:2022-07-10 发布日期:2022-08-10
  • 通讯作者: 彭 琳(1978-),女,河南邓州人,副教授,主要从事问答系统和知识图谱研究,(电子信箱)dapengjiao@163.com。
  • 作者简介:麻之润(1992-),男,安徽合肥人,在读硕士研究生,研究方向为文本分类,(电话)19942414389(电子信箱)rudolph0ma@126.com
  • 基金资助:
    云南省重大科技专项(202002AD080002)

Research on short text classification in agricultural field based on few-shot learning

MA Zhi-run1,2, FEI Fan1, LI Fen1, DONG Hui-jie1, PENG Lin1   

  1. 1. College of Big Data, Yunnan Agricultural University, Kunming 650000, China;
    2. Green Agricultural Products Big Data Intelligent Information Processing Engineering Research Center, Kunming 650000, China
  • Received:2021-06-03 Online:2022-07-10 Published:2022-08-10

摘要: 为了便捷、准确、高效地识别海量信息中所描述问题的所属类别,解决农业领域短文本分类存在数据稀疏性、高度依赖上下文等问题,爬取了10 000多条农业问答领域的短文本,经过清洗、过滤和标注等处理后形成一个5分类的短文本数据集;构建了基于BERT和ERNIE预训练模型的农业短文本分类算法,并与基于决策树模型的农业短文本分类算法进行对比分析。结果表明,随着数据集样本的减少,3种模型的准确率、精确率和召回率均呈下降趋势;基于ERNIE预训练模型的准确率、F1值处于较高水平,远高于同数据的决策树模型,表明构建的农业短文本分类算法能够在数据量不足的情况下依然获得较高的分类效果。

关键词: 预训练模型, 文本分类, BERT, ERNIE

Abstract: In order to conveniently, accurately and efficiently identify the categories of the problems described in the massive information, and to address the problems of data sparsity and high dependence on context in agricultural short text classification, this paper crawled more than 10,000 short texts in the field of agricultural Q&A and formed a 5-classified short text dataset after cleaning, filtering and annotation processes. Then an agricultural short text classification algorithm based on BERT and ERNIE pre-training models was constructed, and compared with the agricultural short text classification algorithm based on the decision tree model. The results indicated that with the reduction of data set samples, the accuracy, precision and recall of the three models all showed a downward trend. The accuracy and F1 value of the ERNIE pre-training model were at a high level, which was much higher than those of the decision tree model with the same data, showing that the constructed agricultural short text classification algorithm could still achieve a high classification effect in the case of insufficient data.

Key words: pre-training model, text classification, BERT, ERNIE

中图分类号: