Semi-Structured Information Extraction Applying Automatic Pattern Discovery

Chang, Chia-Hui; Lui, Shao-Chen; Wu, Yen-Chin

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.author	Chang, Chia-Hui
dc.contributor.author	Lui, Shao-Chen
dc.contributor.author	Wu, Yen-Chin
dc.date.accessioned	2009-06-02T06:20:23Z
dc.date.accessioned	2020-05-25T06:36:40Z	-
dc.date.available	2009-06-02T06:20:23Z
dc.date.available	2020-05-25T06:36:40Z	-
dc.date.issued	2006-10-26T03:15:50Z
dc.date.submitted	2000-12-08
dc.identifier.uri	http://dspace.lib.fcu.edu.tw/handle/2377/2601	-
dc.description.abstract	Information extraction (IE) from semi-structured Web documents is a critical issue for information integration systems on the Internet. Previous work in wrapper in- duction aim to solve this problem by applying machine learning to automatically generate extractors. For ex- ample, WIEN, Stalker, Softmealy, etc. However, this approach still requires human intervention to provide training examples. Hence, the other track to informa- tion extraction tries to save human eort. For exam- ple, Embley et. al. and Chang et al. present dier- ent approaches to record boundary identication of a single Web pages without any training example. Emb- ley's work relies on the intra-page structure constructed by HTML tags (the parse tree), while Chang's work is motivated by repeated patterns formed by multiple aligned records. This paper expands Chang's work to IE and discuss the issues when applying pattern dis- covery for record identication, including the encoding schemes of HTML and ranking criteria of patterns to extract record boundary.
dc.description.sponsorship	中正大學,嘉義縣
dc.format.extent	8p.
dc.format.extent	227105 bytes
dc.format.mimetype	application/pdf
dc.language.iso	zh_TW
dc.relation.ispartofseries	2000 ICS會議
dc.subject.other	Intelligent Applications
dc.title	Semi-Structured Information Extraction Applying Automatic Pattern Discovery
分類:	2000年 ICS 國際計算機會議

文件中的檔案：

檔案	描述	大小	格式
ce07ics002000000052.pdf		221.78 kB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

在 DSpace 系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

逢甲大學校園典藏知識庫