当前位置: 首页 > 文章 > 非平衡数据集中的特征选择方法和三支分类算法研究 河南科技学院学报(自然科学版) 2018 (5) 66-72
Position: Home > Articles > Research on improved feature selection method and NB three-way classification algorithm in unbalanced dataset Journal of Henan Institute of Science and Technology(Natural Science Edition) 2018 (5) 66-72

非平衡数据集中的特征选择方法和三支分类算法研究

作  者:
刘杰;苏慧哲;李艳翠
关键词:
文本分类;非平衡数据集;特征提取;三支决策;朴素贝叶斯算法
摘  要:
针对传统的特征选择方法在非平衡数据集中分类效果不理想的问题,提出了一种适合非平衡数据分类的改进特征选择方法.该方法将集中度和分散度相结合,同时考虑到在文本长短不一时词频对文本分类的作用,得到一种新的词频归一化方法,实现了对传统特征提取方法的改进.另一方面,将三支决策思想引入到朴素贝叶斯算法,得到了NB-三支决策分类算法,并将该算法应用到非平衡数据集的分类.通过两组实验对比结果表明:改进特征选择方法较CHI和IG方法,处理非平衡度高的数据集分类效果较好;选取相同的特征选择方法和数据集,NB-三支分类器比NB-分类器的分类效果好.选用本文提出的改进特征选择方法和NB-三支分类器,在处理非平衡度高且文本长短不一的数据集时,分类效果有一定提升.
译  名:
Research on improved feature selection method and NB three-way classification algorithm in unbalanced dataset
作  者:
LIU Jie;SU Huizhe;LI Yancui;Department of Information Engineering,Henan Institute of Science and Technology;
单  位:
LIU Jie%SU Huizhe%LI Yancui%Department of Information Engineering,Henan Institute of Science and Technology
关键词:
text categorization;;imbalanced datasets;;feature selection;;three-way decision;;Naive Bayes algorithm
摘  要:
Aiming at problems of traditional classification algorithm is not effective in dealing with unbalanced data,an improved feature selection method was proposed.It was combined the concentration information and distribution information.It also took into account the influence of word frequency on text classification under the condition of different texts lengths.A new word frequency normalization method was obtained,and the improvement of feature extraction method was realized.A new classification algorithm of NB three-way decision was put forward,which was combined with the theory of three-way decision and Naive Bayesian algorithm.The new NB three-way classifier was used in imbalanced text categorization.The results of two groups of comparative experiments showed that the improved method could preferably classification effect than CHI and IG.The classification effect of NB-three way classifier was better than that of NB-classifier,which was choosing the same feature selection method and imbalanced datasets.Finally,the feature selection method and NB-three way classifier proposed in this paper can improve the classification effect,which are both used in the unbalanced data set.

相似文章

计量
文章访问数: 10
HTML全文浏览量: 0
PDF下载量: 0

所属期刊

推荐期刊