Pincer-search: A new algorithm for discovering the maximum frequent set

Dao I. Lin, Zvi M. Kedem

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up breadth-first search direction. The computation starts from frequent 1-itemsets (minimal length frequent itemsets) and continues until all maximal (length) frequent itemsets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform reasonably well when all maximal frequent itemsets are short. However, performance drastically decreases when some of the maximal frequent itemsets are relatively long. We present a new algorithm which combines both the bottom-up and topdown searches. The primary search direction is still bottom-up, but a restricted search is also conducted in the top-down direction. This search is used only for maintaining and updating a new data structure we designed, the maximum frequent candidate set. It is used to prune candidates in the bottom-up search. A very important characteristic of the algorithm is that it does not require explicite examination of every frequent itemset. Therefore the algorithm performs well even when some maximal frequent itemsets are long. As its output, the algorithm produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, which therefore specifies immediately all frequent itemsets. We evaluate the performance of the Mgorithm using a well-known benchmark database. The improvements can be up to several orders of magnitude, compared to the best current algorithms.

Original languageEnglish (US)
Title of host publicationAdvances in Database Technology, EDBT 1998 - 6th International Conference on Extending Database Technology, Proceedings
Pages105-119
Number of pages15
Volume1377 LNCS
StatePublished - 1998
Event6th International Conference on Extending Database Technology, EDBT 1998 - Valencia, Spain
Duration: Mar 23 1998Mar 27 1998

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1377 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other6th International Conference on Extending Database Technology, EDBT 1998
CountrySpain
CityValencia
Period3/23/983/27/98

Fingerprint

Frequent Itemsets
Bottom-up
Association rules
Breadth-first Search
Data mining
Data structures
Association Rules
Updating
Immediately
Data Structures
Data Mining
Continue
Benchmark
Decrease

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Lin, D. I., & Kedem, Z. M. (1998). Pincer-search: A new algorithm for discovering the maximum frequent set. In Advances in Database Technology, EDBT 1998 - 6th International Conference on Extending Database Technology, Proceedings (Vol. 1377 LNCS, pp. 105-119). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1377 LNCS).

Pincer-search : A new algorithm for discovering the maximum frequent set. / Lin, Dao I.; Kedem, Zvi M.

Advances in Database Technology, EDBT 1998 - 6th International Conference on Extending Database Technology, Proceedings. Vol. 1377 LNCS 1998. p. 105-119 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1377 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lin, DI & Kedem, ZM 1998, Pincer-search: A new algorithm for discovering the maximum frequent set. in Advances in Database Technology, EDBT 1998 - 6th International Conference on Extending Database Technology, Proceedings. vol. 1377 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 1377 LNCS, pp. 105-119, 6th International Conference on Extending Database Technology, EDBT 1998, Valencia, Spain, 3/23/98.
Lin DI, Kedem ZM. Pincer-search: A new algorithm for discovering the maximum frequent set. In Advances in Database Technology, EDBT 1998 - 6th International Conference on Extending Database Technology, Proceedings. Vol. 1377 LNCS. 1998. p. 105-119. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Lin, Dao I. ; Kedem, Zvi M. / Pincer-search : A new algorithm for discovering the maximum frequent set. Advances in Database Technology, EDBT 1998 - 6th International Conference on Extending Database Technology, Proceedings. Vol. 1377 LNCS 1998. pp. 105-119 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{09850babfbc7469987aa517fdab9349b,
title = "Pincer-search: A new algorithm for discovering the maximum frequent set",
abstract = "Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up breadth-first search direction. The computation starts from frequent 1-itemsets (minimal length frequent itemsets) and continues until all maximal (length) frequent itemsets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform reasonably well when all maximal frequent itemsets are short. However, performance drastically decreases when some of the maximal frequent itemsets are relatively long. We present a new algorithm which combines both the bottom-up and topdown searches. The primary search direction is still bottom-up, but a restricted search is also conducted in the top-down direction. This search is used only for maintaining and updating a new data structure we designed, the maximum frequent candidate set. It is used to prune candidates in the bottom-up search. A very important characteristic of the algorithm is that it does not require explicite examination of every frequent itemset. Therefore the algorithm performs well even when some maximal frequent itemsets are long. As its output, the algorithm produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, which therefore specifies immediately all frequent itemsets. We evaluate the performance of the Mgorithm using a well-known benchmark database. The improvements can be up to several orders of magnitude, compared to the best current algorithms.",
author = "Lin, {Dao I.} and Kedem, {Zvi M.}",
year = "1998",
language = "English (US)",
isbn = "3540642641",
volume = "1377 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "105--119",
booktitle = "Advances in Database Technology, EDBT 1998 - 6th International Conference on Extending Database Technology, Proceedings",

}

TY - GEN

T1 - Pincer-search

T2 - A new algorithm for discovering the maximum frequent set

AU - Lin, Dao I.

AU - Kedem, Zvi M.

PY - 1998

Y1 - 1998

N2 - Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up breadth-first search direction. The computation starts from frequent 1-itemsets (minimal length frequent itemsets) and continues until all maximal (length) frequent itemsets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform reasonably well when all maximal frequent itemsets are short. However, performance drastically decreases when some of the maximal frequent itemsets are relatively long. We present a new algorithm which combines both the bottom-up and topdown searches. The primary search direction is still bottom-up, but a restricted search is also conducted in the top-down direction. This search is used only for maintaining and updating a new data structure we designed, the maximum frequent candidate set. It is used to prune candidates in the bottom-up search. A very important characteristic of the algorithm is that it does not require explicite examination of every frequent itemset. Therefore the algorithm performs well even when some maximal frequent itemsets are long. As its output, the algorithm produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, which therefore specifies immediately all frequent itemsets. We evaluate the performance of the Mgorithm using a well-known benchmark database. The improvements can be up to several orders of magnitude, compared to the best current algorithms.

AB - Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottom-up breadth-first search direction. The computation starts from frequent 1-itemsets (minimal length frequent itemsets) and continues until all maximal (length) frequent itemsets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform reasonably well when all maximal frequent itemsets are short. However, performance drastically decreases when some of the maximal frequent itemsets are relatively long. We present a new algorithm which combines both the bottom-up and topdown searches. The primary search direction is still bottom-up, but a restricted search is also conducted in the top-down direction. This search is used only for maintaining and updating a new data structure we designed, the maximum frequent candidate set. It is used to prune candidates in the bottom-up search. A very important characteristic of the algorithm is that it does not require explicite examination of every frequent itemset. Therefore the algorithm performs well even when some maximal frequent itemsets are long. As its output, the algorithm produces the maximum frequent set, i.e., the set containing all maximal frequent itemsets, which therefore specifies immediately all frequent itemsets. We evaluate the performance of the Mgorithm using a well-known benchmark database. The improvements can be up to several orders of magnitude, compared to the best current algorithms.

UR - http://www.scopus.com/inward/record.url?scp=84890521199&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890521199&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84890521199

SN - 3540642641

SN - 9783540642640

VL - 1377 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 105

EP - 119

BT - Advances in Database Technology, EDBT 1998 - 6th International Conference on Extending Database Technology, Proceedings

ER -