Efficient large-scale distributed training of conditional maximum entropy models

Gideon Mann, Ryan McDonald, Mehryar Mohri, Nathan Silberman, Daniel D. Walker

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Training conditional maximum entropy models on massive data sets requires significant computational resources. We examine three common distributed training methods for conditional maxent: a distributed gradient computation method, a majority vote method, and a mixture weight method. We analyze and compare the CPU and network time complexity of each of these methods and present a theoretical analysis of conditional maxent models, including a study of the convergence of the mixture weight method, the most resource-efficient technique. We also report the results of large-scale experiments comparing these three methods which demonstrate the benefits of the mixture weight method: this method consumes less resources, while achieving a performance comparable to that of standard approaches.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference
Pages1231-1239
Number of pages9
StatePublished - 2009
Event23rd Annual Conference on Neural Information Processing Systems, NIPS 2009 - Vancouver, BC, Canada
Duration: Dec 7 2009Dec 10 2009

Other

Other23rd Annual Conference on Neural Information Processing Systems, NIPS 2009
CountryCanada
CityVancouver, BC
Period12/7/0912/10/09

Fingerprint

Entropy
Program processors
Experiments

ASJC Scopus subject areas

  • Information Systems

Cite this

Mann, G., McDonald, R., Mohri, M., Silberman, N., & Walker, D. D. (2009). Efficient large-scale distributed training of conditional maximum entropy models. In Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference (pp. 1231-1239)

Efficient large-scale distributed training of conditional maximum entropy models. / Mann, Gideon; McDonald, Ryan; Mohri, Mehryar; Silberman, Nathan; Walker, Daniel D.

Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference. 2009. p. 1231-1239.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mann, G, McDonald, R, Mohri, M, Silberman, N & Walker, DD 2009, Efficient large-scale distributed training of conditional maximum entropy models. in Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference. pp. 1231-1239, 23rd Annual Conference on Neural Information Processing Systems, NIPS 2009, Vancouver, BC, Canada, 12/7/09.
Mann G, McDonald R, Mohri M, Silberman N, Walker DD. Efficient large-scale distributed training of conditional maximum entropy models. In Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference. 2009. p. 1231-1239
Mann, Gideon ; McDonald, Ryan ; Mohri, Mehryar ; Silberman, Nathan ; Walker, Daniel D. / Efficient large-scale distributed training of conditional maximum entropy models. Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference. 2009. pp. 1231-1239
@inproceedings{93b6bd8ba3fb490e876da752cb6fffb5,
title = "Efficient large-scale distributed training of conditional maximum entropy models",
abstract = "Training conditional maximum entropy models on massive data sets requires significant computational resources. We examine three common distributed training methods for conditional maxent: a distributed gradient computation method, a majority vote method, and a mixture weight method. We analyze and compare the CPU and network time complexity of each of these methods and present a theoretical analysis of conditional maxent models, including a study of the convergence of the mixture weight method, the most resource-efficient technique. We also report the results of large-scale experiments comparing these three methods which demonstrate the benefits of the mixture weight method: this method consumes less resources, while achieving a performance comparable to that of standard approaches.",
author = "Gideon Mann and Ryan McDonald and Mehryar Mohri and Nathan Silberman and Walker, {Daniel D.}",
year = "2009",
language = "English (US)",
isbn = "9781615679119",
pages = "1231--1239",
booktitle = "Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference",

}

TY - GEN

T1 - Efficient large-scale distributed training of conditional maximum entropy models

AU - Mann, Gideon

AU - McDonald, Ryan

AU - Mohri, Mehryar

AU - Silberman, Nathan

AU - Walker, Daniel D.

PY - 2009

Y1 - 2009

N2 - Training conditional maximum entropy models on massive data sets requires significant computational resources. We examine three common distributed training methods for conditional maxent: a distributed gradient computation method, a majority vote method, and a mixture weight method. We analyze and compare the CPU and network time complexity of each of these methods and present a theoretical analysis of conditional maxent models, including a study of the convergence of the mixture weight method, the most resource-efficient technique. We also report the results of large-scale experiments comparing these three methods which demonstrate the benefits of the mixture weight method: this method consumes less resources, while achieving a performance comparable to that of standard approaches.

AB - Training conditional maximum entropy models on massive data sets requires significant computational resources. We examine three common distributed training methods for conditional maxent: a distributed gradient computation method, a majority vote method, and a mixture weight method. We analyze and compare the CPU and network time complexity of each of these methods and present a theoretical analysis of conditional maxent models, including a study of the convergence of the mixture weight method, the most resource-efficient technique. We also report the results of large-scale experiments comparing these three methods which demonstrate the benefits of the mixture weight method: this method consumes less resources, while achieving a performance comparable to that of standard approaches.

UR - http://www.scopus.com/inward/record.url?scp=80052652249&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80052652249&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781615679119

SP - 1231

EP - 1239

BT - Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference

ER -