Exploiting product distributions to identify relevant variables of correlation immune functions

Lisa Hellerstein, Bernard Roseli, Eric Bach, Soumya Ray, David Page

    Research output: Contribution to journalArticle

    Abstract

    A Boolean function f is correlation immune if each input variable is independent of the output, under the uniform distribution on inputs. For example, the parity function is correlation immune. We consider the problem of identifying relevant variables of a correlation immune function, in the presence of irrelevant variables. We address this problem in two different contexts. First, we analyze Skewing, a heuristic method that was developed to improve the ability of greedy decision tree algorithms to identify relevant variables of correlation immune Boolean functions, given examples drawn from the uniform distribution (Page and Ray, 2003). We present theoretical results revealing both the capabilities and limitations of skewing. Second, we explore the problem of identifying relevant variables in the Product Distribution Choice (PDC) learning model, a model in which the learner can choose product distributions and obtain examples from them. We prove a lemma establishing a property of Boolean functions that may be of independent interest. Using this lemma, we give two new algorithms for finding relevant variables of correlation immune functions in the PDC model.

    Original languageEnglish (US)
    Pages (from-to)2375-2411
    Number of pages37
    JournalJournal of Machine Learning Research
    Volume10
    StatePublished - 2009

    Fingerprint

    Boolean functions
    Boolean Functions
    Heuristic methods
    Decision trees
    Uniform distribution
    Lemma
    Choice Models
    Tree Algorithms
    Heuristic Method
    Decision tree
    Parity
    Half line
    Choose
    Output
    Model

    Keywords

    • Boolean functions
    • Correlation immune functions
    • Product distributions
    • Relevant variables
    • Skewing

    ASJC Scopus subject areas

    • Artificial Intelligence
    • Software
    • Control and Systems Engineering
    • Statistics and Probability

    Cite this

    Exploiting product distributions to identify relevant variables of correlation immune functions. / Hellerstein, Lisa; Roseli, Bernard; Bach, Eric; Ray, Soumya; Page, David.

    In: Journal of Machine Learning Research, Vol. 10, 2009, p. 2375-2411.

    Research output: Contribution to journalArticle

    Hellerstein, L, Roseli, B, Bach, E, Ray, S & Page, D 2009, 'Exploiting product distributions to identify relevant variables of correlation immune functions', Journal of Machine Learning Research, vol. 10, pp. 2375-2411.
    Hellerstein, Lisa ; Roseli, Bernard ; Bach, Eric ; Ray, Soumya ; Page, David. / Exploiting product distributions to identify relevant variables of correlation immune functions. In: Journal of Machine Learning Research. 2009 ; Vol. 10. pp. 2375-2411.
    @article{309d44fb493247289b50049b16ecbc3c,
    title = "Exploiting product distributions to identify relevant variables of correlation immune functions",
    abstract = "A Boolean function f is correlation immune if each input variable is independent of the output, under the uniform distribution on inputs. For example, the parity function is correlation immune. We consider the problem of identifying relevant variables of a correlation immune function, in the presence of irrelevant variables. We address this problem in two different contexts. First, we analyze Skewing, a heuristic method that was developed to improve the ability of greedy decision tree algorithms to identify relevant variables of correlation immune Boolean functions, given examples drawn from the uniform distribution (Page and Ray, 2003). We present theoretical results revealing both the capabilities and limitations of skewing. Second, we explore the problem of identifying relevant variables in the Product Distribution Choice (PDC) learning model, a model in which the learner can choose product distributions and obtain examples from them. We prove a lemma establishing a property of Boolean functions that may be of independent interest. Using this lemma, we give two new algorithms for finding relevant variables of correlation immune functions in the PDC model.",
    keywords = "Boolean functions, Correlation immune functions, Product distributions, Relevant variables, Skewing",
    author = "Lisa Hellerstein and Bernard Roseli and Eric Bach and Soumya Ray and David Page",
    year = "2009",
    language = "English (US)",
    volume = "10",
    pages = "2375--2411",
    journal = "Journal of Machine Learning Research",
    issn = "1532-4435",
    publisher = "Microtome Publishing",

    }

    TY - JOUR

    T1 - Exploiting product distributions to identify relevant variables of correlation immune functions

    AU - Hellerstein, Lisa

    AU - Roseli, Bernard

    AU - Bach, Eric

    AU - Ray, Soumya

    AU - Page, David

    PY - 2009

    Y1 - 2009

    N2 - A Boolean function f is correlation immune if each input variable is independent of the output, under the uniform distribution on inputs. For example, the parity function is correlation immune. We consider the problem of identifying relevant variables of a correlation immune function, in the presence of irrelevant variables. We address this problem in two different contexts. First, we analyze Skewing, a heuristic method that was developed to improve the ability of greedy decision tree algorithms to identify relevant variables of correlation immune Boolean functions, given examples drawn from the uniform distribution (Page and Ray, 2003). We present theoretical results revealing both the capabilities and limitations of skewing. Second, we explore the problem of identifying relevant variables in the Product Distribution Choice (PDC) learning model, a model in which the learner can choose product distributions and obtain examples from them. We prove a lemma establishing a property of Boolean functions that may be of independent interest. Using this lemma, we give two new algorithms for finding relevant variables of correlation immune functions in the PDC model.

    AB - A Boolean function f is correlation immune if each input variable is independent of the output, under the uniform distribution on inputs. For example, the parity function is correlation immune. We consider the problem of identifying relevant variables of a correlation immune function, in the presence of irrelevant variables. We address this problem in two different contexts. First, we analyze Skewing, a heuristic method that was developed to improve the ability of greedy decision tree algorithms to identify relevant variables of correlation immune Boolean functions, given examples drawn from the uniform distribution (Page and Ray, 2003). We present theoretical results revealing both the capabilities and limitations of skewing. Second, we explore the problem of identifying relevant variables in the Product Distribution Choice (PDC) learning model, a model in which the learner can choose product distributions and obtain examples from them. We prove a lemma establishing a property of Boolean functions that may be of independent interest. Using this lemma, we give two new algorithms for finding relevant variables of correlation immune functions in the PDC model.

    KW - Boolean functions

    KW - Correlation immune functions

    KW - Product distributions

    KW - Relevant variables

    KW - Skewing

    UR - http://www.scopus.com/inward/record.url?scp=70450267501&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=70450267501&partnerID=8YFLogxK

    M3 - Article

    AN - SCOPUS:70450267501

    VL - 10

    SP - 2375

    EP - 2411

    JO - Journal of Machine Learning Research

    JF - Journal of Machine Learning Research

    SN - 1532-4435

    ER -