題名: | A Way to Assign Parts-of-Speech Information to Chinese Frequent Strings |
作者: | Lin, Yih-Jeng Huang, Feng-Long Yu, Ming-Shing |
關鍵字: | Chinese Frequent Strings Part-of-speech Treebank Parsing |
期刊名/會議名稱: | 2002 ICS會議 |
摘要: | A CFS is a frequently used combination of Chinese characters which have been defined in our previous research [11]. A CFS may be a proper noun, like “網際網路” (the Internet), a verb phrase, like “全力動員投入” (try one’s best to mobilize), and so on. If a CFS can have some kinds of POS (part-of-speech), we can use it in more applications. In this paper we propose a method to assign the part-of–speech information to CFSs. If a CFS s is also a word w, we can assign the POSs of w to s. When s is a combination of several words, we will try to find some possible POSs associated with it. We use the Sinica Treebank which contains 38,725 parsing trees as our training and testing corpus. We extract 15,946 parsing rules from 90% of the 38,725 parsing trees. There is 10% of the corpus left for outside test. The accuracies of outside test of assigning POSs to CFSs are 71.02% and 98.81% for top 1 and top 5 choices, respectively. |
日期: | 2006-10-24 |
分類: | 2002年 ICS 國際計算機會議 |
文件中的檔案:
檔案 | 描述 | 大小 | 格式 | |
---|---|---|---|---|
ce07ics002002000330.PDF | 50.57 kB | Adobe PDF | 檢視/開啟 |
在 DSpace 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。