### Abstract

A Boolean function f is correlation immune if each input variable is independent of the output, under the uniform distribution on inputs. For example, the parity function is correlation immune. We consider the problem of identifying relevant variables of a correlation immune function, in the presence of irrelevant variables. We address this problem in two different contexts. First, we analyze Skewing, a heuristic method that was developed to improve the ability of greedy decision tree algorithms to identify relevant variables of correlation immune Boolean functions, given examples drawn from the uniform distribution (Page and Ray, 2003). We present theoretical results revealing both the capabilities and limitations of skewing. Second, we explore the problem of identifying relevant variables in the Product Distribution Choice (PDC) learning model, a model in which the learner can choose product distributions and obtain examples from them. We prove a lemma establishing a property of Boolean functions that may be of independent interest. Using this lemma, we give two new algorithms for finding relevant variables of correlation immune functions in the PDC model.

Original language | English (US) |
---|---|

Pages (from-to) | 2375-2411 |

Number of pages | 37 |

Journal | Journal of Machine Learning Research |

Volume | 10 |

State | Published - 2009 |

### Fingerprint

### Keywords

- Boolean functions
- Correlation immune functions
- Product distributions
- Relevant variables
- Skewing

### ASJC Scopus subject areas

- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability

### Cite this

*Journal of Machine Learning Research*,

*10*, 2375-2411.

**Exploiting product distributions to identify relevant variables of correlation immune functions.** / Hellerstein, Lisa; Roseli, Bernard; Bach, Eric; Ray, Soumya; Page, David.

Research output: Contribution to journal › Article

*Journal of Machine Learning Research*, vol. 10, pp. 2375-2411.

}

TY - JOUR

T1 - Exploiting product distributions to identify relevant variables of correlation immune functions

AU - Hellerstein, Lisa

AU - Roseli, Bernard

AU - Bach, Eric

AU - Ray, Soumya

AU - Page, David

PY - 2009

Y1 - 2009

N2 - A Boolean function f is correlation immune if each input variable is independent of the output, under the uniform distribution on inputs. For example, the parity function is correlation immune. We consider the problem of identifying relevant variables of a correlation immune function, in the presence of irrelevant variables. We address this problem in two different contexts. First, we analyze Skewing, a heuristic method that was developed to improve the ability of greedy decision tree algorithms to identify relevant variables of correlation immune Boolean functions, given examples drawn from the uniform distribution (Page and Ray, 2003). We present theoretical results revealing both the capabilities and limitations of skewing. Second, we explore the problem of identifying relevant variables in the Product Distribution Choice (PDC) learning model, a model in which the learner can choose product distributions and obtain examples from them. We prove a lemma establishing a property of Boolean functions that may be of independent interest. Using this lemma, we give two new algorithms for finding relevant variables of correlation immune functions in the PDC model.

AB - A Boolean function f is correlation immune if each input variable is independent of the output, under the uniform distribution on inputs. For example, the parity function is correlation immune. We consider the problem of identifying relevant variables of a correlation immune function, in the presence of irrelevant variables. We address this problem in two different contexts. First, we analyze Skewing, a heuristic method that was developed to improve the ability of greedy decision tree algorithms to identify relevant variables of correlation immune Boolean functions, given examples drawn from the uniform distribution (Page and Ray, 2003). We present theoretical results revealing both the capabilities and limitations of skewing. Second, we explore the problem of identifying relevant variables in the Product Distribution Choice (PDC) learning model, a model in which the learner can choose product distributions and obtain examples from them. We prove a lemma establishing a property of Boolean functions that may be of independent interest. Using this lemma, we give two new algorithms for finding relevant variables of correlation immune functions in the PDC model.

KW - Boolean functions

KW - Correlation immune functions

KW - Product distributions

KW - Relevant variables

KW - Skewing

UR - http://www.scopus.com/inward/record.url?scp=70450267501&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70450267501&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:70450267501

VL - 10

SP - 2375

EP - 2411

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

ER -