Specifications

13.
Accession number: 20130215879946
Title: Mining frequent itemsets based on a vertical bit-vector dot-product CBD-tree
Authors: Yao, Quanzhu1 ; Zhang, Yubing1 ; Zhang, Jiulong1/姚全珠;张玉兵;张九龙
Author affiliation:
1 School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
Corresponding author: Yao, Q. (qzyao@xaut.edu.cn)
Source title: Journal of Convergence Information Technology
Abbreviated source title: J. Convergence Inf. Technol.
Volume: 7
Issue: 23
Issue date: December 2012
Publication year: 2012
Pages: 393-399
Language: English
ISSN: 19759320
E-ISSN: 22339299
Document type: Journal article (JA)
Publisher: Advanced Institute of Convergence Information Technology, Myoungbo Bldg
3F,, Bumin-dong 1-ga, Seo-gu, Busan, 602-816, Korea, Republic of
Abstract: Efficient algorithms for mining frequent itemsets are crucial for mining association
rules as well as for many other data mining tasks. However, the traditional algorithms produce a
large number of candidate frequent itemsets. In this paper, a new algorithm combining
breadth-first with depth-first search strategy is proposed that does not generate a large number
of candidate frequent itemsets and reduces the unnecessary operation. The proposed algorithm
generates frequent patterns based on a vertical bit-vector dot-product and the method
combining breadth-first with depth-first (CBD-Tree). Using the vertical bit-vector dot-product, the
overhead of calculating data itemsets frequency has been decreased due to the efficient digit
arithmetic instead of the comparisons. For CBD-tree, firstly, the frequent 1-itemsets L1 and the
frequent 2-itemsets L2 are generated by breadth-first and effective pruning strategies are
designed fully making use of the L2. Secondly, the CBD-tree is created by copying the generated
subtree of frequent itemsets by depth-first. Finally, a large number of improper candidate
itemsets are ruled out, and the frequent itemsets are generated in the process of tree building.
The experimental results show that the proposed algorithm is more efficient due to reducing the
storage space of the database and the time of generating the frequent itemsets.
Number of references: 10
Main heading: Trees (mathematics)
Controlled terms: Algorithms - Data mining - Digital storage - Forestry -
Vectors
Uncontrolled terms: Breadth-first - Data mining tasks - Depth first - Depth
first search - Dot-product - Item sets - Mining associations - Mining frequent
itemsets - Pruning strategy - Storage spaces - Subtrees - Vertical bit-vector
Classification code: 722.1 Data Storage, Equipment and Techniques - 723 Computer
Software, Data Handling and Applications - 821.0 Woodlands and Forestry - 921