完整後設資料紀錄
DC 欄位語言
dc.contributor.authorTsay, Jyh-Jong
dc.contributor.authorWang, Jing-Doo
dc.date.accessioned2009-06-02T07:21:47Z
dc.date.accessioned2020-05-29T06:16:48Z-
dc.date.available2009-06-02T07:21:47Z
dc.date.available2020-05-29T06:16:48Z-
dc.date.issued2006-11-13
dc.date.submitted1999-12-20
dc.identifier.urihttp://dspace.fcu.edu.tw/handle/2377/3091-
dc.description.abstractIn this paper, we make an extensive comparison of three classifiers, naive Bayes (NB) probabilistic classifier, Rocchio linear classifier and k-Nearest Neighbor (kNN) classifier for Chinese text classification. Our goal is to compare their performance when they are integrated with term selection, term clustering and instance selection methods. Our experiment use one year CNA news articles to extract meaningful terms, one month news articles as training data and 3-day news articles as testing data. When the dimension of term space is high, about 90,000, that Rocchio linear classifier achieves the best average accuracy, 79.35%. The observation is different from previous research that Rocchio have relatively poor performance. When the dimension is reduced to 3,600 by a combination of term selection and term clustering, kNN achieves the best average accuracy, 80.24%. We further use Generalized Instance Set (GIS) algorithm[13] to reduce the size of training data and hence speed up on-line classification of kNN. Experiment show that application of GIS can reduce the number of training data from 6,254 to 1,195, while improving the accuracy of kNN from 80.24% to 81.12%. The last accuracy achieved by previous related research is about 78%.
dc.description.sponsorship淡江大學, 台北縣
dc.format.extent8p.
dc.format.extent768934 bytes
dc.format.mimetypeapplication/pdf
dc.language.isozh_TW
dc.relation.ispartofseries1999 NCS會議
dc.subjectText Categorization
dc.subjectTerm Selection
dc.subjectTerm Clustering
dc.subjectnaive Bayes
dc.subjectRocchio
dc.subjectk-Nearest Neighbor
dc.subject.other資訊擷取與資料挖掘
dc.titleComparing Classifiers for Automatic Chinese Text Categorization
dc.title.alternative中文文件自動化分類方法之比較
分類:1999年 NCS 全國計算機會議

文件中的檔案:
檔案 描述 大小格式 
ce07ncs001999000115.pdf759.04 kBAdobe PDF檢視/開啟


在 DSpace 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。