TY - GEN
T1 - X3-Miner: Mining Patterns from XML Database
AU - Tan, H.
AU - Dillon, T.
AU - Feng, L.
AU - Chang, E.
AU - Hadzic, F.
N1 - Imported from EWI/DB PMS [db-utwente:inpr:0000003652]
PY - 2005
Y1 - 2005
N2 - An XML enabled framework for the representation of association rules in databases was first presented in [4].
In Frequent Structure Mining (FSM), one of the popular approaches is to use graph matching that use data structures such as the adjacency matrix [7] or adjacency list [8].
Another approach represents semistructured tree-like structures using a string representation, which is more space efficient and relatively easy for manipulation [10].
However, with XML, mining association rules are faced with more challenges due to the inherent flexibilities in both structure and semantics, such as: 1) more complicated hierarchical data structure; 2) ordered data context; and 3) much bigger data size.
To tackle these challenges, we propose an approach, X3-Miner, that efficiently extracts patterns from a large XML data set, and overcomes the challenges by: (1) exploring the use of a model validating approach in deducing the number of candidates generated by taking into account the semantics embedded in the tree-like structure in an XML database and obtain only valid candidates out of the XML database; (2) minimising I/O overhead by intersecting XML database with the frequent 1-itemset.
This results in a frequent 1-itemset XML tree.
The algorithm also progressively trims infrequent k-itemsets that contain infrequent (k-1)-itemsets; (3) extending the notion of string representation of a tree structure proposed in [10] to xstring for describing an XML document without loss of both structure and semantics.
Such an extension enables an easier traversal of the tree-structured XML data during our model-validating candidate generation.
Our experiments with both synthetic and real-life data sets demonstrate the effectiveness of the proposed model-validating approach in mining XML data.
AB - An XML enabled framework for the representation of association rules in databases was first presented in [4].
In Frequent Structure Mining (FSM), one of the popular approaches is to use graph matching that use data structures such as the adjacency matrix [7] or adjacency list [8].
Another approach represents semistructured tree-like structures using a string representation, which is more space efficient and relatively easy for manipulation [10].
However, with XML, mining association rules are faced with more challenges due to the inherent flexibilities in both structure and semantics, such as: 1) more complicated hierarchical data structure; 2) ordered data context; and 3) much bigger data size.
To tackle these challenges, we propose an approach, X3-Miner, that efficiently extracts patterns from a large XML data set, and overcomes the challenges by: (1) exploring the use of a model validating approach in deducing the number of candidates generated by taking into account the semantics embedded in the tree-like structure in an XML database and obtain only valid candidates out of the XML database; (2) minimising I/O overhead by intersecting XML database with the frequent 1-itemset.
This results in a frequent 1-itemset XML tree.
The algorithm also progressively trims infrequent k-itemsets that contain infrequent (k-1)-itemsets; (3) extending the notion of string representation of a tree structure proposed in [10] to xstring for describing an XML document without loss of both structure and semantics.
Such an extension enables an easier traversal of the tree-structured XML data during our model-validating candidate generation.
Our experiments with both synthetic and real-life data sets demonstrate the effectiveness of the proposed model-validating approach in mining XML data.
KW - EWI-7337
KW - IR-63536
KW - METIS-229568
KW - DB-DM: DATA MINING
M3 - Conference contribution
SN - 1-84564-017-9
T3 - Information and Communication Technologies
SP - 287
EP - 296
BT - Data Mining VI: Data Mining, Text Mining and their Business Applications
PB - WIT Press
CY - Ashurst, Southampton, UK
T2 - Data Mining VI: Data Mining, Text Mining and their Business Applications
Y2 - 25 May 2005 through 27 May 2005
ER -