完整後設資料紀錄
DC 欄位語言
dc.contributor.authorWang, H.L.
dc.contributor.authorHsu, W.L.
dc.contributor.authorChen, Y.S.
dc.contributor.authorLau, T.L.
dc.contributor.authorTang, C.H.
dc.date.accessioned2009-06-02T07:23:32Z
dc.date.accessioned2020-05-29T06:18:13Z-
dc.date.available2009-06-02T07:23:32Z
dc.date.available2020-05-29T06:18:13Z-
dc.date.issued2006-11-13T01:31:57Z
dc.date.submitted1999-12-20
dc.identifier.urihttp://dspace.fcu.edu.tw/handle/2377/3124-
dc.description.abstractIn this paper, we propose a streamlined approach for extracting information form tables in HTML format. Our approach is based on a set of semantic templates associated with the knowledge representation maps. We apply an abstract model on the templates to support the extraction of tabular logical structure in different stages. Our abstract model includes category identification, reading path construction, and record collection. In this model, we use an abstract table to separate the logical structure from the physical layout. For each table, we try to extract the abstract table from its physical layout. Our approach has three stages. In the first stage, we use semantic tagging templates to identify all possible categories of the cells in the table. In the second stage, we construct the reading path by a cell linking algorithm. In the final stage, we perform the reverse traversal on the reading paths to extract and collect records from this table. We have implemented a prototype of tabular logical structure extraction system in MS-Windows environment. The prototype system provides an interface by which users can input a table in HTML format. Our system also has an interface to output the abstract table of the input table. We have done some experiments on several tables with distinct layout styles by suing our system. Our experimental results show that our prototype system can extract the logical structure of these tables with high precision and recall rate.
dc.description.sponsorship淡江大學, 台北縣
dc.format.extent8p.
dc.format.extent587670 bytes
dc.format.mimetypeapplication/pdf
dc.language.isozh_TW
dc.relation.ispartofseries1999 NCS會議
dc.subject.other資訊擷取與資料挖掘
dc.titleA Streamlined Approach for Tabular Information Extraction
分類:1999年 NCS 全國計算機會議

文件中的檔案:
檔案 描述 大小格式 
ce07ncs001999000116.pdf579.52 kBAdobe PDF檢視/開啟


在 DSpace 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。