完整後設資料紀錄
DC 欄位語言
dc.contributor.authorChang, Chia-Hui
dc.contributor.authorLui, Shao-Chen
dc.contributor.authorWu, Yen-Chin
dc.date.accessioned2009-06-02T06:20:23Z
dc.date.accessioned2020-05-25T06:36:40Z-
dc.date.available2009-06-02T06:20:23Z
dc.date.available2020-05-25T06:36:40Z-
dc.date.issued2006-10-26T03:15:50Z
dc.date.submitted2000-12-08
dc.identifier.urihttp://dspace.lib.fcu.edu.tw/handle/2377/2601-
dc.description.abstractInformation extraction (IE) from semi-structured Web documents is a critical issue for information integration systems on the Internet. Previous work in wrapper in- duction aim to solve this problem by applying machine learning to automatically generate extractors. For ex- ample, WIEN, Stalker, Softmealy, etc. However, this approach still requires human intervention to provide training examples. Hence, the other track to informa- tion extraction tries to save human eort. For exam- ple, Embley et. al. and Chang et al. present dier- ent approaches to record boundary identication of a single Web pages without any training example. Emb- ley's work relies on the intra-page structure constructed by HTML tags (the parse tree), while Chang's work is motivated by repeated patterns formed by multiple aligned records. This paper expands Chang's work to IE and discuss the issues when applying pattern dis- covery for record identication, including the encoding schemes of HTML and ranking criteria of patterns to extract record boundary.
dc.description.sponsorship中正大學,嘉義縣
dc.format.extent8p.
dc.format.extent227105 bytes
dc.format.mimetypeapplication/pdf
dc.language.isozh_TW
dc.relation.ispartofseries2000 ICS會議
dc.subject.otherIntelligent Applications
dc.titleSemi-Structured Information Extraction Applying Automatic Pattern Discovery
分類:2000年 ICS 國際計算機會議

文件中的檔案:
檔案 描述 大小格式 
ce07ics002000000052.pdf221.78 kBAdobe PDF檢視/開啟


在 DSpace 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。