当前位置: 首页 > 文章 > 关于领域语料库的研究 山东农业大学学报(自然科学版) 2014 (3) 360-365
Position: Home > Articles > Research on Corpus in a Field Journal of Shandong Agricultural University(Natural Science Edition) 2014 (3) 360-365

关于领域语料库的研究

作  者:
何焱;丁玲
单  位:
遵义医药高等专科学校;哈尔滨工业大学深圳研究生院
关键词:
语料库;本体结构;分类体系
摘  要:
在网络信息愈加庞杂的背景下,分类技术被广泛的采用,但分类技术一般都需要标准的语料作为训练集,而这些语料往往是通过人工标注的方法来满足其标准性和准确性。这样开发周期相对较长、工作量大,而且不易改变分类。本文针对这一问题,研究如何根据各领域网站的本体结构从中获取语料并将这些语料按照给定的分类体系重新组织净化,最终获得高质量的分类语料库。
译  名:
Research on Corpus in a Field
作  者:
HE Yan;DING Ling;College of Zunyi Medical and Pharmaceutical;Harbin Institute of Technology Shenzhen Graduate School;
关键词:
Corpus;;ontology structure;;classification system
摘  要:
Under the more complex background of a network information, classification technology is widely adopted.Classification techniques generally required standard corpus as a training set, and these data go often through the method of manual annotation to meet their standards and accuracy. Therefore, it is relatively long to develope, and has a heavy workload,and it is not easy to change the classification. Aiming at this problem, the study on how to obtain data from the ontology structure of field websites and then to reorganize and purify according to the given classification system. In the end, to get the high qualitative classified corpus.

相似文章

计量
文章访问数: 11
HTML全文浏览量: 0
PDF下载量: 0

所属期刊

推荐期刊