題名: | HybridDiff: An Algorithm for A New Tree Editing Distance Problem |
作者: | Wu, I-Chen Lin, Bing-Hung Chen, Loon-Been Su, Jui-Yuan Hsu, Po-Chun |
關鍵字: | Constrained editing distance problem Hybrid editing distance problem Algorithm Change detection Tree editing distance General editing distance problem |
期刊名/會議名稱: | 2006 ICS會議 |
摘要: | Change detection between documents plays an important role in many applications. Zhang and Shasha defined an editing distance problem, called theChange detection between documents plays an important role in many applications. Zhang and Shasha defined an editing distance problem, called the general editing distance (GED) problem in this paper, between two ordered labeled trees T1 and T2, and devised a new algorithm to solve the problem in time O(|T1||T2|H(T1)H(T2)). H(T) denotes min(D(T),L(T)) , where D(T) is the longest depth of tree T and L(T) is the number of leaves of tree T. Zhang also defined an editing distance problem, called the constrained editing distance (CED) problem in this paper, and devised a new algorithm to solve it in time O(|T1||T2|). This paper proposes a new editing distance problem, called the hybrid editing distance (HED) problem, a hybrid problem of both GED and CED problems. Some tree nodes, called C-nodes, follow the restrictions of the CED problem, while all the other tree nodes, called G-nodes, follow the restrictions of the GED problem. In a tree T, a G-subtree Tu is defined to be a maximal connected component in T whose root is a C-node u and whose other nodes are G-nodes. Thus, this paper presents a new algorithm to solve it in time O(|T1||T2|Hmax 1 Hmax 2 ), where Hmax 1 is the maximum H(Tu) for all G-subtrees Tu in T1, and Hmax 2 is the maximum H(Tu) for all G-subtrees Tu in T2. In the case of all C-nodes, that is Hmax 1 = Hmax 2 = 1, the time complexity is equal to that of Zhang’s algorithm. In the case of all G-nodes, that is Hmax 1 = H(T1) and Hmax 2 = H(T2), the time complexity is equal to that of Zhang and Shasha’s. Finally, from our observation on HTML files, this problem can be applied to the editing distances of the document trees of HTML files. In HTML files, inline elements are close to C-nodes, while block-level elements are close to G-nodes. |
日期: | 2007-01-31T01:55:26Z |
分類: | 2006年 ICS 國際計算機會議 |
文件中的檔案:
檔案 | 描述 | 大小 | 格式 | |
---|---|---|---|---|
ce07ics002006000161.pdf | 2.12 MB | Adobe PDF | 檢視/開啟 |
在 DSpace 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。