Improvements of Smoothing Methods for Language Models

題名:	Improvements of Smoothing Methods for Language Models
作者:	Huang, Feng-Long Lin, Yih-Jeng
關鍵字:	Language model Smoothing method Good-Turing Cross entropy Redistribution
期刊名/會議名稱:	2004 ICS會議
摘要:	We study the improvement for the well-known Good-Turing smoothing and a novel idea of probability redistribution for unseen events is proposed. The smoothing method is used to resolve the zero count problem in traditional language models. The cut-off value co for number of count is used to improve the Good- Turing Smoothing. The best k on various training data N are analyzed. Basically, there are two processes for smoothing techniques: 1)discounting and 2)redistributing. Instead of uniform assignment of probability used by several well-known methods for each unseen event we propose new concept of improvement for redistribution of smoothing method. Based on the probabilistic behavior of seen events, the redistribution process is non-uniform. The empirical results are demonstrated and analyzed for two improvements. The improvements discussed in the paper are apparent and effective for smoothing methods, especially on higher unseen event rate.
日期:	2006-10-11T08:05:06Z
分類:	2004年 ICS 國際計算機會議

文件中的檔案：

檔案	描述	大小	格式
ce07ics002004000091.pdf		255.27 kB	Adobe PDF	檢視/開啟

在 DSpace 系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

逢甲大學校園典藏知識庫