当前位置: 首页 > 文章 > 一种文本多级分类方法研究 长江大学学报(自科版)农学卷 2008,5 (2) 7+101-104
Position: Home > Articles > Study of Multi-level Text Classification Based on Combination of Rule and KNN Algorithm Journal of Yangtze University(Natural Science Edition)Agricultural Science Volumn 2008,5 (2) 7+101-104

一种文本多级分类方法研究

作  者:
肖红;刘淑华
单  位:
大庆油田有限责任公司第二采油厂;大庆石油学院计算机与信息技术学院
关键词:
文本分类;K-近邻分类算法;向量空间模型
摘  要:
针对目前基于规则和基于统计的文本分类方法存在的不足,提出了一种新颖的基于规则和K-近邻分类相融合的文本分类方法。首先,对描述文本特征的传统向量空间模型进行了扩充,给出了具体的扩展模型。然后,基于扩展模型提出了一种规则的表示方法,并为每一条规则赋予了一个强弱系数,根据这个系数可以对识别的文本按级别排序。最后,通过设定一个阀值,将级别低于阀值的文本过滤掉。该方法可有效地排除被K-近邻分类误识别的那些文本,从而在一定程度上提高了分类的正确率。通过小数据集测试实验结果表明,该方法是有效的、可行的。
译  名:
Study of Multi-level Text Classification Based on Combination of Rule and KNN Algorithm
作  者:
XIAO Hong (Daqing Petroleum Institute, Daqing 163318)LIU SHu-hua (Daqing Oilfield Company Ltd, Daqing 163414)
关键词:
text classification; KNN algorithm; vector space model
摘  要:
There were two methods of text classification, one was based on rules, another was on statistic.The two methods had merit and defect.Aiming at their respective shortcomings, an effective method of text classification was proposed that it included assembled KNN and rule method.The conventional VSM description was expanded in the text, and a detailed description of the extended VSM was shown.Based on it, an expression method of rules is presented.By assigning a coefficient it indicates the accuracy and sorting the results, the documents were filtered,the coefficients are less than that of the given threshold.Hence, the inaccuracy documents identified by KNN method were excluded, and the precision and the recall were improved in a certain extent.Experimental results show that the method is effective and feasible.

相似文章

计量
文章访问数: 4
HTML全文浏览量: 0
PDF下载量: 0

所属期刊

推荐期刊