題名: | Text Categorization Using Latent Topics as Additional Features |
作者: | Mizugai, Hiroshi Paik, Incheon Kanemoto, Shigeru |
關鍵字: | Machine Learning Text Categorization Latent Topics AdaBoost |
期刊名/會議名稱: | 2008 ICS會議 |
摘要: | In feature selection of text categorization, there are methods which handle word sense disambiguation by extracting synonymy and polysemy among words in documents. One of the methods utilizes latent topics underlying documents by using a topic model. PLSA and LDA have been proposed as representative models. In this paper, two features which include both TF-IDF and the latent topic values which extracted automatically from topic models were utilized for text categorization using AdaBoost. Then, the performances were compared with the ones of only TF-IDF features. As a result, this study evaluates effectiveness and weakness of the augmented features. |
日期: | 2009-02-12T02:15:47Z |
分類: | 2008年 ICS 國際計算機會議 |
文件中的檔案:
檔案 | 描述 | 大小 | 格式 | |
---|---|---|---|---|
ce07ics002008000154.pdf | 181.08 kB | Adobe PDF | 檢視/開啟 |
在 DSpace 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。