題名: Improvements of Smoothing Methods for Language Models
作者: Huang, Feng-Long
Lin, Yih-Jeng
關鍵字: Language model
Smoothing method
Good-Turing
Cross entropy
Redistribution
期刊名/會議名稱: 2004 ICS會議
摘要: We study the improvement for the well-known Good-Turing smoothing and a novel idea of probability redistribution for unseen events is proposed. The smoothing method is used to resolve the zero count problem in traditional language models. The cut-off value co for number of count is used to improve the Good- Turing Smoothing. The best k on various training data N are analyzed. Basically, there are two processes for smoothing techniques: 1)discounting and 2)redistributing. Instead of uniform assignment of probability used by several well-known methods for each unseen event we propose new concept of improvement for redistribution of smoothing method. Based on the probabilistic behavior of seen events, the redistribution process is non-uniform. The empirical results are demonstrated and analyzed for two improvements. The improvements discussed in the paper are apparent and effective for smoothing methods, especially on higher unseen event rate.
日期: 2006-10-11T08:05:06Z
分類:2004年 ICS 國際計算機會議

文件中的檔案:
檔案 描述 大小格式 
ce07ics002004000091.pdf255.27 kBAdobe PDF檢視/開啟


在 DSpace 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。