題名: HybridDiff: An Algorithm for A New Tree Editing Distance Problem
作者: Wu, I-Chen
Lin, Bing-Hung
Chen, Loon-Been
Su, Jui-Yuan
Hsu, Po-Chun
關鍵字: Constrained editing distance problem
Hybrid editing distance problem
Algorithm
Change detection
Tree editing distance
General editing distance problem
期刊名/會議名稱: 2006 ICS會議
摘要: Change detection between documents plays an important role in many applications. Zhang and Shasha defined an editing distance problem, called theChange detection between documents plays an important role in many applications. Zhang and Shasha defined an editing distance problem, called the general editing distance (GED) problem in this paper, between two ordered labeled trees T1 and T2, and devised a new algorithm to solve the problem in time O(|T1||T2|H(T1)H(T2)). H(T) denotes min(D(T),L(T)) , where D(T) is the longest depth of tree T and L(T) is the number of leaves of tree T. Zhang also defined an editing distance problem, called the constrained editing distance (CED) problem in this paper, and devised a new algorithm to solve it in time O(|T1||T2|). This paper proposes a new editing distance problem, called the hybrid editing distance (HED) problem, a hybrid problem of both GED and CED problems. Some tree nodes, called C-nodes, follow the restrictions of the CED problem, while all the other tree nodes, called G-nodes, follow the restrictions of the GED problem. In a tree T, a G-subtree Tu is defined to be a maximal connected component in T whose root is a C-node u and whose other nodes are G-nodes. Thus, this paper presents a new algorithm to solve it in time O(|T1||T2|Hmax 1 Hmax 2 ), where Hmax 1 is the maximum H(Tu) for all G-subtrees Tu in T1, and Hmax 2 is the maximum H(Tu) for all G-subtrees Tu in T2. In the case of all C-nodes, that is Hmax 1 = Hmax 2 = 1, the time complexity is equal to that of Zhang’s algorithm. In the case of all G-nodes, that is Hmax 1 = H(T1) and Hmax 2 = H(T2), the time complexity is equal to that of Zhang and Shasha’s. Finally, from our observation on HTML files, this problem can be applied to the editing distances of the document trees of HTML files. In HTML files, inline elements are close to C-nodes, while block-level elements are close to G-nodes.
日期: 2007-01-31T01:55:26Z
分類:2006年 ICS 國際計算機會議

文件中的檔案:
檔案 描述 大小格式 
ce07ics002006000161.pdf2.12 MBAdobe PDF檢視/開啟


在 DSpace 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。