題名: | Improvements of Smoothing Methods for Language Models |
作者: | Huang, Feng-Long Lin, Yih-Jeng |
關鍵字: | Language model Smoothing method Good-Turing Cross entropy Redistribution |
期刊名/會議名稱: | 2004 ICS會議 |
摘要: | We study the improvement for the well-known Good-Turing smoothing and a novel idea of probability redistribution for unseen events is proposed. The smoothing method is used to resolve the zero count problem in traditional language models. The cut-off value co for number of count is used to improve the Good- Turing Smoothing. The best k on various training data N are analyzed. Basically, there are two processes for smoothing techniques: 1)discounting and 2)redistributing. Instead of uniform assignment of probability used by several well-known methods for each unseen event we propose new concept of improvement for redistribution of smoothing method. Based on the probabilistic behavior of seen events, the redistribution process is non-uniform. The empirical results are demonstrated and analyzed for two improvements. The improvements discussed in the paper are apparent and effective for smoothing methods, especially on higher unseen event rate. |
日期: | 2006-10-11T08:05:06Z |
分類: | 2004年 ICS 國際計算機會議 |
文件中的檔案:
檔案 | 描述 | 大小 | 格式 | |
---|---|---|---|---|
ce07ics002004000091.pdf | 255.27 kB | Adobe PDF | 檢視/開啟 |
在 DSpace 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。