L 2 regularization for learning kernels

Corinna Cortes, Mehryar Mohri, Afshin Rostamizadeh

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The choice of the kernel is critical to the success of many learning algorithms but it is typically left to the user. Instead, the training data can be used to learn the kernel by selecting it out of a given family, such as that of non-negative linear combinations of p base kernels, constrained by a trace or L1 regularization. This paper studies the problem of learning kernels with the same family of kernels but with an L2 regularization instead, and for regression problems. We analyze the problem of learning kernels with ridge regression. We derive the form of the solution of the optimization problem and give an efficient iterative algorithm for computing that solution. We present a novel theoretical analysis of the problem based on stability and give learning bounds for orthogonal kernels that contain only an additive termO(p/m) when compared to the standard kernel ridge regression stability bound. We also report the results of experiments indicating that L1 regularization can lead to modest improvements for a small number of kernels, but to performance degradations in larger-scale cases. In contrast, L2 regularization never degrades performance and in fact achieves significant improvements with a large number of kernels.

Original languageEnglish (US)
Title of host publicationProceedings of the 25th Conference on Uncertainty in Artificial Intelligence, UAI 2009
Pages109-116
Number of pages8
StatePublished - 2009
Event25th Conference on Uncertainty in Artificial Intelligence, UAI 2009 - Montreal, QC, Canada
Duration: Jun 18 2009Jun 21 2009

Other

Other25th Conference on Uncertainty in Artificial Intelligence, UAI 2009
CountryCanada
CityMontreal, QC
Period6/18/096/21/09

Fingerprint

Regularization
kernel
Learning algorithms
Degradation
Ridge Regression
Experiments
Learning
Kernel Regression
Iterative Algorithm
Linear Combination
Learning Algorithm
Theoretical Analysis
Efficient Algorithms
Regression
Non-negative
Trace
Optimization Problem
Computing
Experiment

ASJC Scopus subject areas

  • Artificial Intelligence
  • Applied Mathematics

Cite this

Cortes, C., Mohri, M., & Rostamizadeh, A. (2009). L 2 regularization for learning kernels. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, UAI 2009 (pp. 109-116)

L 2 regularization for learning kernels. / Cortes, Corinna; Mohri, Mehryar; Rostamizadeh, Afshin.

Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, UAI 2009. 2009. p. 109-116.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cortes, C, Mohri, M & Rostamizadeh, A 2009, L 2 regularization for learning kernels. in Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, UAI 2009. pp. 109-116, 25th Conference on Uncertainty in Artificial Intelligence, UAI 2009, Montreal, QC, Canada, 6/18/09.
Cortes C, Mohri M, Rostamizadeh A. L 2 regularization for learning kernels. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, UAI 2009. 2009. p. 109-116
Cortes, Corinna ; Mohri, Mehryar ; Rostamizadeh, Afshin. / L 2 regularization for learning kernels. Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, UAI 2009. 2009. pp. 109-116
@inproceedings{759de40c98a044d88c2010d8e671cb2b,
title = "L 2 regularization for learning kernels",
abstract = "The choice of the kernel is critical to the success of many learning algorithms but it is typically left to the user. Instead, the training data can be used to learn the kernel by selecting it out of a given family, such as that of non-negative linear combinations of p base kernels, constrained by a trace or L1 regularization. This paper studies the problem of learning kernels with the same family of kernels but with an L2 regularization instead, and for regression problems. We analyze the problem of learning kernels with ridge regression. We derive the form of the solution of the optimization problem and give an efficient iterative algorithm for computing that solution. We present a novel theoretical analysis of the problem based on stability and give learning bounds for orthogonal kernels that contain only an additive termO(p/m) when compared to the standard kernel ridge regression stability bound. We also report the results of experiments indicating that L1 regularization can lead to modest improvements for a small number of kernels, but to performance degradations in larger-scale cases. In contrast, L2 regularization never degrades performance and in fact achieves significant improvements with a large number of kernels.",
author = "Corinna Cortes and Mehryar Mohri and Afshin Rostamizadeh",
year = "2009",
language = "English (US)",
pages = "109--116",
booktitle = "Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, UAI 2009",

}

TY - GEN

T1 - L 2 regularization for learning kernels

AU - Cortes, Corinna

AU - Mohri, Mehryar

AU - Rostamizadeh, Afshin

PY - 2009

Y1 - 2009

N2 - The choice of the kernel is critical to the success of many learning algorithms but it is typically left to the user. Instead, the training data can be used to learn the kernel by selecting it out of a given family, such as that of non-negative linear combinations of p base kernels, constrained by a trace or L1 regularization. This paper studies the problem of learning kernels with the same family of kernels but with an L2 regularization instead, and for regression problems. We analyze the problem of learning kernels with ridge regression. We derive the form of the solution of the optimization problem and give an efficient iterative algorithm for computing that solution. We present a novel theoretical analysis of the problem based on stability and give learning bounds for orthogonal kernels that contain only an additive termO(p/m) when compared to the standard kernel ridge regression stability bound. We also report the results of experiments indicating that L1 regularization can lead to modest improvements for a small number of kernels, but to performance degradations in larger-scale cases. In contrast, L2 regularization never degrades performance and in fact achieves significant improvements with a large number of kernels.

AB - The choice of the kernel is critical to the success of many learning algorithms but it is typically left to the user. Instead, the training data can be used to learn the kernel by selecting it out of a given family, such as that of non-negative linear combinations of p base kernels, constrained by a trace or L1 regularization. This paper studies the problem of learning kernels with the same family of kernels but with an L2 regularization instead, and for regression problems. We analyze the problem of learning kernels with ridge regression. We derive the form of the solution of the optimization problem and give an efficient iterative algorithm for computing that solution. We present a novel theoretical analysis of the problem based on stability and give learning bounds for orthogonal kernels that contain only an additive termO(p/m) when compared to the standard kernel ridge regression stability bound. We also report the results of experiments indicating that L1 regularization can lead to modest improvements for a small number of kernels, but to performance degradations in larger-scale cases. In contrast, L2 regularization never degrades performance and in fact achieves significant improvements with a large number of kernels.

UR - http://www.scopus.com/inward/record.url?scp=77958134983&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77958134983&partnerID=8YFLogxK

M3 - Conference contribution

SP - 109

EP - 116

BT - Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, UAI 2009

ER -