中国农业期刊集成服务平台

当前位置: 首页 > 文章 > 基于随机森林的文本分类模型研究农业图书情报学报 2016 (11) 50-54

Position: Home > Articles > Research on text Classification Model Based on Random Forests Journal of Library and Information Science in Agriculture 2016 (11) 50-54

基于随机森林的文本分类模型研究

作者：

罗新

单位：

华南理工大学工商管理学院

关键词：

文本分类;随机森林;CART树

摘要：

文本分类作为处理大量文本数据的关键技术,可以在较大程度上解决"信息爆炸"所带来的问题。Breiman提出的随机森林算法具有良好的泛化性和鲁棒性、对噪声不敏感、能处理连续属性的特点,很适合用来建立文本分类模型。笔者将随机森林算法尝试性引入文本分类领域,构建基于随机森林的文本分类模型,并在标准文本测试集Reuters-21578进行测试和比较,结果表明:(1)该模型可以较好地应用于文本分类;(2)与基于CART、REPTree和J48的文本分类模型的结果相比较,基于随机森林的文本分类模型的效果最好,F1-Measure达到了0.777;(3)基于随机森林的文本分类模型操作方便、直观有效、评价结果可靠,为文本分类研究提供了新思路。

译名：

Research on text Classification Model Based on Random Forests

作者：

LUO Xin;School of Business Administration,South China University of Technology;

关键词：

Random forests;;Text classification;;CART(Classification and Regression Tree)

摘要：

Text classification is the key technology for processing large amount of text data. It can solve the information explosion problem in a certain extent. Random forests algorithm proposed by Breiman has the characteristics of good generalization and robustness, insensitivity for noise and ability in dealing with continuous attributes, which is very suitable for the establishment of text classification model. This paper attempted to construct the text classification model based on random forests algorithm, and compared with the text categorization model Reuters-21578 to verify the model's validity and accuracy for classification. Results showed: this model could be applied in text classification well; compared with the results of CART, REPTree and J48 it models, it had the best effect, whose F1-Measure was 0.777; it had easy, intuitive and effective operation, and reliable results, which provided new idea for text classification research.

相似文章

人工神经网络在渔业高价值专利筛选中的应用-以中国水产科学研究院为例 [金武, 王书磊, 刘晓萌, 夏晔, 刘建伟] 渔业信息与战略 2021 (1) 40-44
基于支持向量机的中文农业文本分类技术研究 [魏芳芳, 段青玲, 肖晓琰, 张磊] 农业机械学报 2015 (1) 174-179
基于word2vec和LSTM的饮食健康文本分类研究 [赵明, 杜会芳, 董翠翠, 陈长松] 农业机械学报 2017,48 (10) 202-208
基于eEP的文本分类算法在农家书屋数字化平台中的应用 [温箐笛, 杨帆] 安徽农业科学 2011,39 (11) 598-600
非平衡数据集中的特征选择方法和三支分类算法研究 [刘杰, 苏慧哲, 李艳翠] 河南科技学院学报(自然科学版) 2018 (5) 66-72
基于高斯混合模型的林业信息文本分类算法 [陈宇, 许莉薇] 中南林业科技大学学报 2014 (8) 114-119
基于Spark框架XGBoost的林业文本并行分类方法研究 [崔晓晖, 师栋瑜, 陈志泊, 许福] 农业机械学报 2019 (6) 280-287
一种基于信息增益的特征选择方法 [黄志艳] 山东农业大学学报(自然科学版) 2013,44 (2) 252-256
基于粒子群智能的中文文本分类模型比较研究 [罗新] 农业图书情报学报 2018 (4) 18-22
自组织神经网络在文本分类中的应用研究 [朱秀华] 农业图书情报学报 2009,21 (8) 28-31

导出引用

计量

文章访问数: 8

HTML全文浏览量: 0

PDF下载量: 0

基于随机森林的文本分类模型研究

相似文章

所属期刊

农业图书情报学报

推荐期刊

河北北方学院学报(自然科学版)

农业图书情报学报