Specifications

13.

Accession number: 20130215879946

Title: Mining frequent itemsets based on a vertical bit-vector dot-product CBD-tree

Authors: Yao, Quanzhu1 ; Zhang, Yubing1 ; Zhang, Jiulong1/姚全珠;张玉兵;张九龙

Author affiliation:

1 School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China

Corresponding author: Yao, Q. (qzyao@xaut.edu.cn)

Source title: Journal of Convergence Information Technology

Abbreviated source title: J. Convergence Inf. Technol.

Volume: 7

Issue: 23

Issue date: December 2012

Publication year: 2012

Pages: 393-399

Language: English

ISSN: 19759320

E-ISSN: 22339299

Document type: Journal article (JA)

Publisher: Advanced Institute of Convergence Information Technology, Myoungbo Bldg

3F,, Bumin-dong 1-ga, Seo-gu, Busan, 602-816, Korea, Republic of

Abstract: Efficient algorithms for mining frequent itemsets are crucial for mining association

rules as well as for many other data mining tasks. However, the traditional algorithms produce a

large number of candidate frequent itemsets. In this paper, a new algorithm combining

breadth-first with depth-first search strategy is proposed that does not generate a large number

of candidate frequent itemsets and reduces the unnecessary operation. The proposed algorithm

generates frequent patterns based on a vertical bit-vector dot-product and the method

combining breadth-first with depth-first (CBD-Tree). Using the vertical bit-vector dot-product, the

overhead of calculating data itemsets frequency has been decreased due to the efficient digit

arithmetic instead of the comparisons. For CBD-tree, firstly, the frequent 1-itemsets L1 and the

frequent 2-itemsets L2 are generated by breadth-first and effective pruning strategies are

designed fully making use of the L2. Secondly, the CBD-tree is created by copying the generated

subtree of frequent itemsets by depth-first. Finally, a large number of improper candidate

itemsets are ruled out, and the frequent itemsets are generated in the process of tree building.

The experimental results show that the proposed algorithm is more efficient due to reducing the

storage space of the database and the time of generating the frequent itemsets.

Number of references: 10

Main heading: Trees (mathematics)

Controlled terms: Algorithms - Data mining - Digital storage - Forestry -

Vectors

Uncontrolled terms: Breadth-first - Data mining tasks - Depth first - Depth

first search - Dot-product - Item sets - Mining associations - Mining frequent

itemsets - Pruning strategy - Storage spaces - Subtrees - Vertical bit-vector

Classification code: 722.1 Data Storage, Equipment and Techniques - 723 Computer

Software, Data Handling and Applications - 821.0 Woodlands and Forestry - 921