題名: | Web Document Classification based on Tagged-Region Progressive Analysis |
作者: | Sung, Li-Chun Chen, Meng-Chang Kao, Chin-Hwa |
關鍵字: | Web categorization Progressive Analysis |
期刊名/會議名稱: | 2004 ICS會議 |
摘要: | In this paper, we propose an intelligent web document classification method, called TAgged- Region Progressive Analysis (TARPA). Instead of parsing the whole content of the web page while classifying a web document, TARPA parses the document into finer structured Tagged-Regions and extracts fewer and the most important regions to analyze and classify. If the few important tagged regions are not sufficient to allow TARPA to classify the document, other important regions and linked pages can be used for analysis progressively to enhance the classification performance. TARPA possesses two stages: learning stage and classification stage. The learning stage discriminates the importance of tag-pairs, and the classification stage follows the importance order of tag-pairs to analyze the document. As a result, TARPA can classify a web document using few contents while with higher classification rate and shorter processing time. Experiments show that 91% of the testing web documents can be correctly classified by only feeding the TARPA classifier with 40% to 50% of the document contents. |
日期: | 2006-10-18T07:53:34Z |
分類: | 2004年 ICS 國際計算機會議 |
文件中的檔案:
檔案 | 描述 | 大小 | 格式 | |
---|---|---|---|---|
ce07ics002004000044.pdf | 432.9 kB | Adobe PDF | 檢視/開啟 |
在 DSpace 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。