Simultaneous learning of trees and representations for extreme classification and density estimation

Yacine Jernite, Anna Choromanska, David Sontag

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We consider multi-class classification where the predictor has a hierarchical structure that allows for a very large number of labels both at train and test time. The predictive power of such models can heavily depend on the structure of the tree, and although past work showed how to learn the tree structure, it expected that the feature vectors remained static. We provide a novel algorithm to simultaneously perform representation learning for the input data and learning of the hierarchical predictor. Our approach optimizes an objective function which favors balanced and easily-separable multi-way node partitions. We theoretically analyze this objective, showing that it gives rise to a boosting style property and a bound on classification error. We next show how to extend the algorithm to conditional density estimation. We empirically validate both variants of the algorithm on text classification and language modeling, respectively, and show that they compare favorably to common baselines in terms of accuracy and running time.

Original languageEnglish (US)
Title of host publication34th International Conference on Machine Learning, ICML 2017
PublisherInternational Machine Learning Society (IMLS)
Pages2613-2633
Number of pages21
Volume4
ISBN (Electronic)9781510855144
StatePublished - Jan 1 2017
Event34th International Conference on Machine Learning, ICML 2017 - Sydney, Australia
Duration: Aug 6 2017Aug 11 2017

Other

Other34th International Conference on Machine Learning, ICML 2017
CountryAustralia
CitySydney
Period8/6/178/11/17

Fingerprint

Labels

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Human-Computer Interaction
  • Software

Cite this

Jernite, Y., Choromanska, A., & Sontag, D. (2017). Simultaneous learning of trees and representations for extreme classification and density estimation. In 34th International Conference on Machine Learning, ICML 2017 (Vol. 4, pp. 2613-2633). International Machine Learning Society (IMLS).

Simultaneous learning of trees and representations for extreme classification and density estimation. / Jernite, Yacine; Choromanska, Anna; Sontag, David.

34th International Conference on Machine Learning, ICML 2017. Vol. 4 International Machine Learning Society (IMLS), 2017. p. 2613-2633.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Jernite, Y, Choromanska, A & Sontag, D 2017, Simultaneous learning of trees and representations for extreme classification and density estimation. in 34th International Conference on Machine Learning, ICML 2017. vol. 4, International Machine Learning Society (IMLS), pp. 2613-2633, 34th International Conference on Machine Learning, ICML 2017, Sydney, Australia, 8/6/17.
Jernite Y, Choromanska A, Sontag D. Simultaneous learning of trees and representations for extreme classification and density estimation. In 34th International Conference on Machine Learning, ICML 2017. Vol. 4. International Machine Learning Society (IMLS). 2017. p. 2613-2633
Jernite, Yacine ; Choromanska, Anna ; Sontag, David. / Simultaneous learning of trees and representations for extreme classification and density estimation. 34th International Conference on Machine Learning, ICML 2017. Vol. 4 International Machine Learning Society (IMLS), 2017. pp. 2613-2633
@inproceedings{62917dd4eb8b4b7bafc2e1895b475374,
title = "Simultaneous learning of trees and representations for extreme classification and density estimation",
abstract = "We consider multi-class classification where the predictor has a hierarchical structure that allows for a very large number of labels both at train and test time. The predictive power of such models can heavily depend on the structure of the tree, and although past work showed how to learn the tree structure, it expected that the feature vectors remained static. We provide a novel algorithm to simultaneously perform representation learning for the input data and learning of the hierarchical predictor. Our approach optimizes an objective function which favors balanced and easily-separable multi-way node partitions. We theoretically analyze this objective, showing that it gives rise to a boosting style property and a bound on classification error. We next show how to extend the algorithm to conditional density estimation. We empirically validate both variants of the algorithm on text classification and language modeling, respectively, and show that they compare favorably to common baselines in terms of accuracy and running time.",
author = "Yacine Jernite and Anna Choromanska and David Sontag",
year = "2017",
month = "1",
day = "1",
language = "English (US)",
volume = "4",
pages = "2613--2633",
booktitle = "34th International Conference on Machine Learning, ICML 2017",
publisher = "International Machine Learning Society (IMLS)",

}

TY - GEN

T1 - Simultaneous learning of trees and representations for extreme classification and density estimation

AU - Jernite, Yacine

AU - Choromanska, Anna

AU - Sontag, David

PY - 2017/1/1

Y1 - 2017/1/1

N2 - We consider multi-class classification where the predictor has a hierarchical structure that allows for a very large number of labels both at train and test time. The predictive power of such models can heavily depend on the structure of the tree, and although past work showed how to learn the tree structure, it expected that the feature vectors remained static. We provide a novel algorithm to simultaneously perform representation learning for the input data and learning of the hierarchical predictor. Our approach optimizes an objective function which favors balanced and easily-separable multi-way node partitions. We theoretically analyze this objective, showing that it gives rise to a boosting style property and a bound on classification error. We next show how to extend the algorithm to conditional density estimation. We empirically validate both variants of the algorithm on text classification and language modeling, respectively, and show that they compare favorably to common baselines in terms of accuracy and running time.

AB - We consider multi-class classification where the predictor has a hierarchical structure that allows for a very large number of labels both at train and test time. The predictive power of such models can heavily depend on the structure of the tree, and although past work showed how to learn the tree structure, it expected that the feature vectors remained static. We provide a novel algorithm to simultaneously perform representation learning for the input data and learning of the hierarchical predictor. Our approach optimizes an objective function which favors balanced and easily-separable multi-way node partitions. We theoretically analyze this objective, showing that it gives rise to a boosting style property and a bound on classification error. We next show how to extend the algorithm to conditional density estimation. We empirically validate both variants of the algorithm on text classification and language modeling, respectively, and show that they compare favorably to common baselines in terms of accuracy and running time.

UR - http://www.scopus.com/inward/record.url?scp=85048456742&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85048456742&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85048456742

VL - 4

SP - 2613

EP - 2633

BT - 34th International Conference on Machine Learning, ICML 2017

PB - International Machine Learning Society (IMLS)

ER -