Influence sets based on reverse nearest neighbor queries

Flip Korn, Shanmugavelayutham Muthukrishnan

    Research output: Contribution to journalArticle

    Abstract

    Inherent in the operation of many decision support and continuous referral systems is the notion of the “influence" of a data point on the database. This notion arises in examples such as finding the set of customers affected by the opening of a new store outlet location, notifying the subset of subscribers to a digital library who will find a newly added document most relevant, etc. Standard approaches to determining the influence set of a data point involve range searching and nearest neighbor queries. In this paper, we formalize a novel notion of influence based on reverse neighbor queries and its variants. Since the nearest neighbor relation is not symmetric, the set of points that are closest to a query point (i.e., the nearest neighbors) differs from the set of points that have the query point as their nearest neighbor (called the reverse nearest neighbors). Influence sets based on reverse nearest neighbor (RNN) queries seem to capture the intuitive notion of influence from our motivating examples. We present a general approach for solving RNN queries and an efficient R-tree based method for large data sets, based on this approach. Although the RNN query appears to be natural, it has not been studied previously. RNN queries are of independent interest, and as such should be part of the suite of available queries for processing spatial and multimedia data. In our experiments with real geographical data, the proposed method appears to scale logarithmically, whereas straightforward sequential scan scales linearly. Our experimental study also shows that approaches based on range searching or nearest neighbors are ineffective at finding influence sets of our interest.

    Original languageEnglish (US)
    Pages (from-to)201-212
    Number of pages12
    JournalSIGMOD Record (ACM Special Interest Group on Management of Data)
    Volume29
    Issue number2
    DOIs
    StatePublished - Jan 1 2000

    Fingerprint

    Digital libraries
    Processing
    Experiments

    ASJC Scopus subject areas

    • Software
    • Information Systems

    Cite this

    Influence sets based on reverse nearest neighbor queries. / Korn, Flip; Muthukrishnan, Shanmugavelayutham.

    In: SIGMOD Record (ACM Special Interest Group on Management of Data), Vol. 29, No. 2, 01.01.2000, p. 201-212.

    Research output: Contribution to journalArticle

    Korn, Flip ; Muthukrishnan, Shanmugavelayutham. / Influence sets based on reverse nearest neighbor queries. In: SIGMOD Record (ACM Special Interest Group on Management of Data). 2000 ; Vol. 29, No. 2. pp. 201-212.
    @article{dbc07fb1bcac4364b386d537068266fc,
    title = "Influence sets based on reverse nearest neighbor queries",
    abstract = "Inherent in the operation of many decision support and continuous referral systems is the notion of the “influence{"} of a data point on the database. This notion arises in examples such as finding the set of customers affected by the opening of a new store outlet location, notifying the subset of subscribers to a digital library who will find a newly added document most relevant, etc. Standard approaches to determining the influence set of a data point involve range searching and nearest neighbor queries. In this paper, we formalize a novel notion of influence based on reverse neighbor queries and its variants. Since the nearest neighbor relation is not symmetric, the set of points that are closest to a query point (i.e., the nearest neighbors) differs from the set of points that have the query point as their nearest neighbor (called the reverse nearest neighbors). Influence sets based on reverse nearest neighbor (RNN) queries seem to capture the intuitive notion of influence from our motivating examples. We present a general approach for solving RNN queries and an efficient R-tree based method for large data sets, based on this approach. Although the RNN query appears to be natural, it has not been studied previously. RNN queries are of independent interest, and as such should be part of the suite of available queries for processing spatial and multimedia data. In our experiments with real geographical data, the proposed method appears to scale logarithmically, whereas straightforward sequential scan scales linearly. Our experimental study also shows that approaches based on range searching or nearest neighbors are ineffective at finding influence sets of our interest.",
    author = "Flip Korn and Shanmugavelayutham Muthukrishnan",
    year = "2000",
    month = "1",
    day = "1",
    doi = "10.1145/335191.335415",
    language = "English (US)",
    volume = "29",
    pages = "201--212",
    journal = "SIGMOD Record",
    issn = "0163-5808",
    publisher = "Association for Computing Machinery (ACM)",
    number = "2",

    }

    TY - JOUR

    T1 - Influence sets based on reverse nearest neighbor queries

    AU - Korn, Flip

    AU - Muthukrishnan, Shanmugavelayutham

    PY - 2000/1/1

    Y1 - 2000/1/1

    N2 - Inherent in the operation of many decision support and continuous referral systems is the notion of the “influence" of a data point on the database. This notion arises in examples such as finding the set of customers affected by the opening of a new store outlet location, notifying the subset of subscribers to a digital library who will find a newly added document most relevant, etc. Standard approaches to determining the influence set of a data point involve range searching and nearest neighbor queries. In this paper, we formalize a novel notion of influence based on reverse neighbor queries and its variants. Since the nearest neighbor relation is not symmetric, the set of points that are closest to a query point (i.e., the nearest neighbors) differs from the set of points that have the query point as their nearest neighbor (called the reverse nearest neighbors). Influence sets based on reverse nearest neighbor (RNN) queries seem to capture the intuitive notion of influence from our motivating examples. We present a general approach for solving RNN queries and an efficient R-tree based method for large data sets, based on this approach. Although the RNN query appears to be natural, it has not been studied previously. RNN queries are of independent interest, and as such should be part of the suite of available queries for processing spatial and multimedia data. In our experiments with real geographical data, the proposed method appears to scale logarithmically, whereas straightforward sequential scan scales linearly. Our experimental study also shows that approaches based on range searching or nearest neighbors are ineffective at finding influence sets of our interest.

    AB - Inherent in the operation of many decision support and continuous referral systems is the notion of the “influence" of a data point on the database. This notion arises in examples such as finding the set of customers affected by the opening of a new store outlet location, notifying the subset of subscribers to a digital library who will find a newly added document most relevant, etc. Standard approaches to determining the influence set of a data point involve range searching and nearest neighbor queries. In this paper, we formalize a novel notion of influence based on reverse neighbor queries and its variants. Since the nearest neighbor relation is not symmetric, the set of points that are closest to a query point (i.e., the nearest neighbors) differs from the set of points that have the query point as their nearest neighbor (called the reverse nearest neighbors). Influence sets based on reverse nearest neighbor (RNN) queries seem to capture the intuitive notion of influence from our motivating examples. We present a general approach for solving RNN queries and an efficient R-tree based method for large data sets, based on this approach. Although the RNN query appears to be natural, it has not been studied previously. RNN queries are of independent interest, and as such should be part of the suite of available queries for processing spatial and multimedia data. In our experiments with real geographical data, the proposed method appears to scale logarithmically, whereas straightforward sequential scan scales linearly. Our experimental study also shows that approaches based on range searching or nearest neighbors are ineffective at finding influence sets of our interest.

    UR - http://www.scopus.com/inward/record.url?scp=0039845446&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=0039845446&partnerID=8YFLogxK

    U2 - 10.1145/335191.335415

    DO - 10.1145/335191.335415

    M3 - Article

    AN - SCOPUS:0039845446

    VL - 29

    SP - 201

    EP - 212

    JO - SIGMOD Record

    JF - SIGMOD Record

    SN - 0163-5808

    IS - 2

    ER -