Learning efficiently with approximate inference via dual losses

Ofer Meshi, David Sontag, Tommi Jaakkola, Amir Globerson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Many structured prediction tasks involve complex models where inference is computationally intractable, but where it can be well approximated using a linear programming relaxation. Previous approaches for learning for structured prediction (e.g., cutting-plane, subgradient methods, perceptron) repeatedly make predictions for some of the data points. These approaches are computationally demanding because each prediction involves solving a linear program to optimally. We present a scalable algorithm for learning for structured prediction. The main idea is to instead solve the dual of the structured prediction loss. We formulate the learning task as a convex minimization over both the weights and the dual variables corresponding to each data point. As a result, we can begin to optimize the weights even before completely solving any of the individual prediction problems. We show how the dual variables can be efficiently optimized using co-ordinate descent. Our algorithm is competitive with state-of-the-art methods such as stochastic subgradient and cutting-plane.

Original languageEnglish (US)
Title of host publicationICML 2010 - Proceedings, 27th International Conference on Machine Learning
Pages783-790
Number of pages8
StatePublished - 2010
Event27th International Conference on Machine Learning, ICML 2010 - Haifa, Israel
Duration: Jun 21 2010Jun 25 2010

Other

Other27th International Conference on Machine Learning, ICML 2010
CountryIsrael
CityHaifa
Period6/21/106/25/10

Fingerprint

learning
Linear programming
programming
Neural networks

ASJC Scopus subject areas

  • Artificial Intelligence
  • Education

Cite this

Meshi, O., Sontag, D., Jaakkola, T., & Globerson, A. (2010). Learning efficiently with approximate inference via dual losses. In ICML 2010 - Proceedings, 27th International Conference on Machine Learning (pp. 783-790)

Learning efficiently with approximate inference via dual losses. / Meshi, Ofer; Sontag, David; Jaakkola, Tommi; Globerson, Amir.

ICML 2010 - Proceedings, 27th International Conference on Machine Learning. 2010. p. 783-790.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Meshi, O, Sontag, D, Jaakkola, T & Globerson, A 2010, Learning efficiently with approximate inference via dual losses. in ICML 2010 - Proceedings, 27th International Conference on Machine Learning. pp. 783-790, 27th International Conference on Machine Learning, ICML 2010, Haifa, Israel, 6/21/10.
Meshi O, Sontag D, Jaakkola T, Globerson A. Learning efficiently with approximate inference via dual losses. In ICML 2010 - Proceedings, 27th International Conference on Machine Learning. 2010. p. 783-790
Meshi, Ofer ; Sontag, David ; Jaakkola, Tommi ; Globerson, Amir. / Learning efficiently with approximate inference via dual losses. ICML 2010 - Proceedings, 27th International Conference on Machine Learning. 2010. pp. 783-790
@inproceedings{1660cbf111c4437f85fa74218808a242,
title = "Learning efficiently with approximate inference via dual losses",
abstract = "Many structured prediction tasks involve complex models where inference is computationally intractable, but where it can be well approximated using a linear programming relaxation. Previous approaches for learning for structured prediction (e.g., cutting-plane, subgradient methods, perceptron) repeatedly make predictions for some of the data points. These approaches are computationally demanding because each prediction involves solving a linear program to optimally. We present a scalable algorithm for learning for structured prediction. The main idea is to instead solve the dual of the structured prediction loss. We formulate the learning task as a convex minimization over both the weights and the dual variables corresponding to each data point. As a result, we can begin to optimize the weights even before completely solving any of the individual prediction problems. We show how the dual variables can be efficiently optimized using co-ordinate descent. Our algorithm is competitive with state-of-the-art methods such as stochastic subgradient and cutting-plane.",
author = "Ofer Meshi and David Sontag and Tommi Jaakkola and Amir Globerson",
year = "2010",
language = "English (US)",
isbn = "9781605589077",
pages = "783--790",
booktitle = "ICML 2010 - Proceedings, 27th International Conference on Machine Learning",

}

TY - GEN

T1 - Learning efficiently with approximate inference via dual losses

AU - Meshi, Ofer

AU - Sontag, David

AU - Jaakkola, Tommi

AU - Globerson, Amir

PY - 2010

Y1 - 2010

N2 - Many structured prediction tasks involve complex models where inference is computationally intractable, but where it can be well approximated using a linear programming relaxation. Previous approaches for learning for structured prediction (e.g., cutting-plane, subgradient methods, perceptron) repeatedly make predictions for some of the data points. These approaches are computationally demanding because each prediction involves solving a linear program to optimally. We present a scalable algorithm for learning for structured prediction. The main idea is to instead solve the dual of the structured prediction loss. We formulate the learning task as a convex minimization over both the weights and the dual variables corresponding to each data point. As a result, we can begin to optimize the weights even before completely solving any of the individual prediction problems. We show how the dual variables can be efficiently optimized using co-ordinate descent. Our algorithm is competitive with state-of-the-art methods such as stochastic subgradient and cutting-plane.

AB - Many structured prediction tasks involve complex models where inference is computationally intractable, but where it can be well approximated using a linear programming relaxation. Previous approaches for learning for structured prediction (e.g., cutting-plane, subgradient methods, perceptron) repeatedly make predictions for some of the data points. These approaches are computationally demanding because each prediction involves solving a linear program to optimally. We present a scalable algorithm for learning for structured prediction. The main idea is to instead solve the dual of the structured prediction loss. We formulate the learning task as a convex minimization over both the weights and the dual variables corresponding to each data point. As a result, we can begin to optimize the weights even before completely solving any of the individual prediction problems. We show how the dual variables can be efficiently optimized using co-ordinate descent. Our algorithm is competitive with state-of-the-art methods such as stochastic subgradient and cutting-plane.

UR - http://www.scopus.com/inward/record.url?scp=77956556288&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77956556288&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781605589077

SP - 783

EP - 790

BT - ICML 2010 - Proceedings, 27th International Conference on Machine Learning

ER -