題名: A Scalable Approach for Chinese Term Extraction
作者: Tsay, Jyh-Jong
Wang, Jing-Doo
期刊名/會議名稱: 2000 ICS會議
摘要: Term extraction is very helpful for Information Retrieval(IR) systems to have higher precision in retrieval, and that this capability is in demand for all of the Internet searching tools. In this paper, we develop a scalable approach via String B-tree(SB-tree) to identify significant terms from large amount of Chinese text data, which does not use a dictionary. Our approach consists of four steps : (i) texts information database, (ii) SB-tree construction, (iii) candidate significant term extraction and (iv) significant term validation. Our experiment uses three year news from Central News Agency(CNA) as the source to extract significant terms. The total number of the news and characters are 220; 395 and 80; 046; 457 respectively. With the training corpus from such a long time period, we not only have robust statistic of terms, i.e. term frequency and document frequency, but also can detect some events via the distribution of significant terms according to different scale of time interval. What we have done is somewhat a fundamental work of text data warehouse.
日期: 2006-11-17T03:39:38Z
分類:2000年 ICS 國際計算機會議

文件中的檔案:
檔案 描述 大小格式 
ce07ics002000000199.pdf1.72 MBAdobe PDF檢視/開啟


在 DSpace 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。