Antipole tree indexing to support range search and k-nearest neighbor search in metric spaces

Domenico Cantone, Alfredo Ferro, Alfredo Pulvirenti, Diego Reforgiato Recupero, Dennis Shasha

Research output: Contribution to journalArticle

Abstract

Range and k-nearest neighbor searching are core problems in pattern recognition. Given a database S of objects in a metric space M and a query object q in M, in a range searching problem the goal is to find the objects of 5 within some threshold distance to q, whereas in a k-nearest neighbor searching problem, the q elements of 5 closest to q must be produced. These problems can obviously be solved with a linear number of distance calculations, by comparing the query object against every object in the database. However, the goal is to solve such problems much faster. We combine and extend ideas from the M-Tree, the Multivantage Point structure, and the FQ-Tree to create a new structure in the "bisector tree" class, called the Antipole Tree. Bisection is based on the proximity to an "Antipole" pair of elements generated by a suitable linear randomized tournament. The final winners a, b of such a tournament are far enough apart to approximate the diameter of the splitting set. If dist(a, b) is larger than the chosen cluster diameter threshold, then the cluster is split. The proposed data structure is an indexing scheme suitable for (exact and approximate) best match searching on generic metric spaces. The Antipole Tree outperforms by a factor of approximately two existing structures such as List of Clusters, M-Trees, and others and, in many cases, it achieves better clustering properties.

Original languageEnglish (US)
Pages (from-to)535-550
Number of pages16
JournalIEEE Transactions on Knowledge and Data Engineering
Volume17
Issue number4
DOIs
StatePublished - Apr 2005

Fingerprint

Pattern recognition
Data structures
Nearest neighbor search

Keywords

  • Indexing methods
  • Information search and retrieval
  • Similarity measures

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Artificial Intelligence
  • Information Systems

Cite this

Antipole tree indexing to support range search and k-nearest neighbor search in metric spaces. / Cantone, Domenico; Ferro, Alfredo; Pulvirenti, Alfredo; Recupero, Diego Reforgiato; Shasha, Dennis.

In: IEEE Transactions on Knowledge and Data Engineering, Vol. 17, No. 4, 04.2005, p. 535-550.

Research output: Contribution to journalArticle

Cantone, Domenico ; Ferro, Alfredo ; Pulvirenti, Alfredo ; Recupero, Diego Reforgiato ; Shasha, Dennis. / Antipole tree indexing to support range search and k-nearest neighbor search in metric spaces. In: IEEE Transactions on Knowledge and Data Engineering. 2005 ; Vol. 17, No. 4. pp. 535-550.
@article{7a020d2850094648ac488cb323da276a,
title = "Antipole tree indexing to support range search and k-nearest neighbor search in metric spaces",
abstract = "Range and k-nearest neighbor searching are core problems in pattern recognition. Given a database S of objects in a metric space M and a query object q in M, in a range searching problem the goal is to find the objects of 5 within some threshold distance to q, whereas in a k-nearest neighbor searching problem, the q elements of 5 closest to q must be produced. These problems can obviously be solved with a linear number of distance calculations, by comparing the query object against every object in the database. However, the goal is to solve such problems much faster. We combine and extend ideas from the M-Tree, the Multivantage Point structure, and the FQ-Tree to create a new structure in the {"}bisector tree{"} class, called the Antipole Tree. Bisection is based on the proximity to an {"}Antipole{"} pair of elements generated by a suitable linear randomized tournament. The final winners a, b of such a tournament are far enough apart to approximate the diameter of the splitting set. If dist(a, b) is larger than the chosen cluster diameter threshold, then the cluster is split. The proposed data structure is an indexing scheme suitable for (exact and approximate) best match searching on generic metric spaces. The Antipole Tree outperforms by a factor of approximately two existing structures such as List of Clusters, M-Trees, and others and, in many cases, it achieves better clustering properties.",
keywords = "Indexing methods, Information search and retrieval, Similarity measures",
author = "Domenico Cantone and Alfredo Ferro and Alfredo Pulvirenti and Recupero, {Diego Reforgiato} and Dennis Shasha",
year = "2005",
month = "4",
doi = "10.1109/TKDE.2005.53",
language = "English (US)",
volume = "17",
pages = "535--550",
journal = "IEEE Transactions on Knowledge and Data Engineering",
issn = "1041-4347",
publisher = "IEEE Computer Society",
number = "4",

}

TY - JOUR

T1 - Antipole tree indexing to support range search and k-nearest neighbor search in metric spaces

AU - Cantone, Domenico

AU - Ferro, Alfredo

AU - Pulvirenti, Alfredo

AU - Recupero, Diego Reforgiato

AU - Shasha, Dennis

PY - 2005/4

Y1 - 2005/4

N2 - Range and k-nearest neighbor searching are core problems in pattern recognition. Given a database S of objects in a metric space M and a query object q in M, in a range searching problem the goal is to find the objects of 5 within some threshold distance to q, whereas in a k-nearest neighbor searching problem, the q elements of 5 closest to q must be produced. These problems can obviously be solved with a linear number of distance calculations, by comparing the query object against every object in the database. However, the goal is to solve such problems much faster. We combine and extend ideas from the M-Tree, the Multivantage Point structure, and the FQ-Tree to create a new structure in the "bisector tree" class, called the Antipole Tree. Bisection is based on the proximity to an "Antipole" pair of elements generated by a suitable linear randomized tournament. The final winners a, b of such a tournament are far enough apart to approximate the diameter of the splitting set. If dist(a, b) is larger than the chosen cluster diameter threshold, then the cluster is split. The proposed data structure is an indexing scheme suitable for (exact and approximate) best match searching on generic metric spaces. The Antipole Tree outperforms by a factor of approximately two existing structures such as List of Clusters, M-Trees, and others and, in many cases, it achieves better clustering properties.

AB - Range and k-nearest neighbor searching are core problems in pattern recognition. Given a database S of objects in a metric space M and a query object q in M, in a range searching problem the goal is to find the objects of 5 within some threshold distance to q, whereas in a k-nearest neighbor searching problem, the q elements of 5 closest to q must be produced. These problems can obviously be solved with a linear number of distance calculations, by comparing the query object against every object in the database. However, the goal is to solve such problems much faster. We combine and extend ideas from the M-Tree, the Multivantage Point structure, and the FQ-Tree to create a new structure in the "bisector tree" class, called the Antipole Tree. Bisection is based on the proximity to an "Antipole" pair of elements generated by a suitable linear randomized tournament. The final winners a, b of such a tournament are far enough apart to approximate the diameter of the splitting set. If dist(a, b) is larger than the chosen cluster diameter threshold, then the cluster is split. The proposed data structure is an indexing scheme suitable for (exact and approximate) best match searching on generic metric spaces. The Antipole Tree outperforms by a factor of approximately two existing structures such as List of Clusters, M-Trees, and others and, in many cases, it achieves better clustering properties.

KW - Indexing methods

KW - Information search and retrieval

KW - Similarity measures

UR - http://www.scopus.com/inward/record.url?scp=13244278960&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=13244278960&partnerID=8YFLogxK

U2 - 10.1109/TKDE.2005.53

DO - 10.1109/TKDE.2005.53

M3 - Article

AN - SCOPUS:13244278960

VL - 17

SP - 535

EP - 550

JO - IEEE Transactions on Knowledge and Data Engineering

JF - IEEE Transactions on Knowledge and Data Engineering

SN - 1041-4347

IS - 4

ER -