### Abstract

Most generalization bounds in learning theory are based on some measure of the complexity of the hypothesis class used, independently of any algorithm. In contrast, the notion of algorithmic stability can be used to derive tight generalization bounds that are tailored to specific learning algorithms by exploiting their particular properties. However, as in much of learning theory, existing stability analyses and bounds apply only in the scenario where the samples are independently and identically distributed. In many machine learning applications, however, this assumption does not hold. The observations received by the learning algorithm often have some inherent temporal dependence. This paper studies the scenario where the observations are drawn from a stationary φ-mixing or β-mixing sequence, a widely adopted assumption in the study of non-i.i.d. processes that implies a dependence between observations weakening over time. We prove novel and distinct stability-based generalization bounds for stationary φ-mixing and β-mixing sequences. These bounds strictly generalize the bounds given in the i.i.d. case and apply to all stable learning algorithms, thereby extending the use of stability-bounds to non-i.i.d. scenarios. We also illustrate the application of our φ-mixing generalization bounds to general classes of learning algorithms, including Support Vector Regression, Kernel Ridge Regression, and Support Vector Machines, and many other kernel regularization-based and relative entropy-based regularization algorithms. These novel bounds can thus be viewed as the first theoretical basis for the use of these algorithms in non-i.i.d. scenarios.

Original language | English (US) |
---|---|

Pages (from-to) | 789-814 |

Number of pages | 26 |

Journal | Journal of Machine Learning Research |

Volume | 11 |

State | Published - Feb 2010 |

### Fingerprint

### Keywords

- Algorithmic stability
- Generalization bounds
- Learning in non-i.i.d. scenarios
- Learning theory
- Mixing distributions
- Weakly dependent observations

### ASJC Scopus subject areas

- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability

### Cite this

*Journal of Machine Learning Research*,

*11*, 789-814.

**Stability bounds for stationary φ-mixing and β-mixing processes.** / Mohri, Mehryar; Rostamizadeh, Afshin.

Research output: Contribution to journal › Article

*Journal of Machine Learning Research*, vol. 11, pp. 789-814.

}

TY - JOUR

T1 - Stability bounds for stationary φ-mixing and β-mixing processes

AU - Mohri, Mehryar

AU - Rostamizadeh, Afshin

PY - 2010/2

Y1 - 2010/2

N2 - Most generalization bounds in learning theory are based on some measure of the complexity of the hypothesis class used, independently of any algorithm. In contrast, the notion of algorithmic stability can be used to derive tight generalization bounds that are tailored to specific learning algorithms by exploiting their particular properties. However, as in much of learning theory, existing stability analyses and bounds apply only in the scenario where the samples are independently and identically distributed. In many machine learning applications, however, this assumption does not hold. The observations received by the learning algorithm often have some inherent temporal dependence. This paper studies the scenario where the observations are drawn from a stationary φ-mixing or β-mixing sequence, a widely adopted assumption in the study of non-i.i.d. processes that implies a dependence between observations weakening over time. We prove novel and distinct stability-based generalization bounds for stationary φ-mixing and β-mixing sequences. These bounds strictly generalize the bounds given in the i.i.d. case and apply to all stable learning algorithms, thereby extending the use of stability-bounds to non-i.i.d. scenarios. We also illustrate the application of our φ-mixing generalization bounds to general classes of learning algorithms, including Support Vector Regression, Kernel Ridge Regression, and Support Vector Machines, and many other kernel regularization-based and relative entropy-based regularization algorithms. These novel bounds can thus be viewed as the first theoretical basis for the use of these algorithms in non-i.i.d. scenarios.

AB - Most generalization bounds in learning theory are based on some measure of the complexity of the hypothesis class used, independently of any algorithm. In contrast, the notion of algorithmic stability can be used to derive tight generalization bounds that are tailored to specific learning algorithms by exploiting their particular properties. However, as in much of learning theory, existing stability analyses and bounds apply only in the scenario where the samples are independently and identically distributed. In many machine learning applications, however, this assumption does not hold. The observations received by the learning algorithm often have some inherent temporal dependence. This paper studies the scenario where the observations are drawn from a stationary φ-mixing or β-mixing sequence, a widely adopted assumption in the study of non-i.i.d. processes that implies a dependence between observations weakening over time. We prove novel and distinct stability-based generalization bounds for stationary φ-mixing and β-mixing sequences. These bounds strictly generalize the bounds given in the i.i.d. case and apply to all stable learning algorithms, thereby extending the use of stability-bounds to non-i.i.d. scenarios. We also illustrate the application of our φ-mixing generalization bounds to general classes of learning algorithms, including Support Vector Regression, Kernel Ridge Regression, and Support Vector Machines, and many other kernel regularization-based and relative entropy-based regularization algorithms. These novel bounds can thus be viewed as the first theoretical basis for the use of these algorithms in non-i.i.d. scenarios.

KW - Algorithmic stability

KW - Generalization bounds

KW - Learning in non-i.i.d. scenarios

KW - Learning theory

KW - Mixing distributions

KW - Weakly dependent observations

UR - http://www.scopus.com/inward/record.url?scp=77949498582&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77949498582&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:77949498582

VL - 11

SP - 789

EP - 814

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

ER -