### Abstract

Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up breadth-first search direction. The computation starts from frequent 1-itemsets (minimal length frequent itemsets) and continues until all maximal (length) frequent itemsets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform reasonably well when all maximal frequent itemsets are short. However, performance drastically decreases when some of the maximal frequent itemsets are relatively long. We present a new algorithm which combines both the bottom-up and topdown searches. The primary search direction is still bottom-up, but a restricted search is also conducted in the top-down direction. This search is used only for maintaining and updating a new data structure we designed, the maximum frequent candidate set. It is used to prune candidates in the bottom-up search. A very important characteristic of the algorithm is that it does not require explicite examination of every frequent itemset. Therefore the algorithm performs well even when some maximal frequent itemsets are long. As its output, the algorithm produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, which therefore specifies immediately all frequent itemsets. We evaluate the performance of the Mgorithm using a well-known benchmark database. The improvements can be up to several orders of magnitude, compared to the best current algorithms.

Original language | English (US) |
---|---|

Title of host publication | Advances in Database Technology, EDBT 1998 - 6th International Conference on Extending Database Technology, Proceedings |

Pages | 105-119 |

Number of pages | 15 |

Volume | 1377 LNCS |

State | Published - 1998 |

Event | 6th International Conference on Extending Database Technology, EDBT 1998 - Valencia, Spain Duration: Mar 23 1998 → Mar 27 1998 |

### Publication series

Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|

Volume | 1377 LNCS |

ISSN (Print) | 03029743 |

ISSN (Electronic) | 16113349 |

### Other

Other | 6th International Conference on Extending Database Technology, EDBT 1998 |
---|---|

Country | Spain |

City | Valencia |

Period | 3/23/98 → 3/27/98 |

### Fingerprint

### ASJC Scopus subject areas

- Computer Science(all)
- Theoretical Computer Science

### Cite this

*Advances in Database Technology, EDBT 1998 - 6th International Conference on Extending Database Technology, Proceedings*(Vol. 1377 LNCS, pp. 105-119). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1377 LNCS).

**Pincer-search : A new algorithm for discovering the maximum frequent set.** / Lin, Dao I.; Kedem, Zvi M.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*Advances in Database Technology, EDBT 1998 - 6th International Conference on Extending Database Technology, Proceedings.*vol. 1377 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 1377 LNCS, pp. 105-119, 6th International Conference on Extending Database Technology, EDBT 1998, Valencia, Spain, 3/23/98.

}

TY - GEN

T1 - Pincer-search

T2 - A new algorithm for discovering the maximum frequent set

AU - Lin, Dao I.

AU - Kedem, Zvi M.

PY - 1998

Y1 - 1998

N2 - Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up breadth-first search direction. The computation starts from frequent 1-itemsets (minimal length frequent itemsets) and continues until all maximal (length) frequent itemsets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform reasonably well when all maximal frequent itemsets are short. However, performance drastically decreases when some of the maximal frequent itemsets are relatively long. We present a new algorithm which combines both the bottom-up and topdown searches. The primary search direction is still bottom-up, but a restricted search is also conducted in the top-down direction. This search is used only for maintaining and updating a new data structure we designed, the maximum frequent candidate set. It is used to prune candidates in the bottom-up search. A very important characteristic of the algorithm is that it does not require explicite examination of every frequent itemset. Therefore the algorithm performs well even when some maximal frequent itemsets are long. As its output, the algorithm produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, which therefore specifies immediately all frequent itemsets. We evaluate the performance of the Mgorithm using a well-known benchmark database. The improvements can be up to several orders of magnitude, compared to the best current algorithms.

AB - Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up breadth-first search direction. The computation starts from frequent 1-itemsets (minimal length frequent itemsets) and continues until all maximal (length) frequent itemsets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform reasonably well when all maximal frequent itemsets are short. However, performance drastically decreases when some of the maximal frequent itemsets are relatively long. We present a new algorithm which combines both the bottom-up and topdown searches. The primary search direction is still bottom-up, but a restricted search is also conducted in the top-down direction. This search is used only for maintaining and updating a new data structure we designed, the maximum frequent candidate set. It is used to prune candidates in the bottom-up search. A very important characteristic of the algorithm is that it does not require explicite examination of every frequent itemset. Therefore the algorithm performs well even when some maximal frequent itemsets are long. As its output, the algorithm produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, which therefore specifies immediately all frequent itemsets. We evaluate the performance of the Mgorithm using a well-known benchmark database. The improvements can be up to several orders of magnitude, compared to the best current algorithms.

UR - http://www.scopus.com/inward/record.url?scp=84890521199&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890521199&partnerID=8YFLogxK

M3 - Conference contribution

SN - 3540642641

SN - 9783540642640

VL - 1377 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 105

EP - 119

BT - Advances in Database Technology, EDBT 1998 - 6th International Conference on Extending Database Technology, Proceedings

ER -