### Abstract

In many regression problems, the variable to be predicted depends not only on a sample-specific feature vector, but also on an unknown (latent) manifold that must satisfy known constraints. An example is house prices, which depend on the characteristics of the house, and on the desirability of the neighborhood, which is not directly measurable. The proposed method comprises two trainable components. The first one is a parametric model that predicts the "intrinsic" price of the house from its description. The second one is a smooth, non-parametric model of the latent "desirability" manifold. The predicted price of a house is the product of its intrinsic price and desirability. The two components are trained simultaneously using a deterministic form of the EM algorithm. The model was trained on a large dataset of houses from Los Angeles county. It produces better predictions than pure parametric and non-parametric models. It also produces useful estimates of the desirability surface at each location.

Original language | English (US) |
---|---|

Title of host publication | KDD-2007: Proceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |

Pages | 173-182 |

Number of pages | 10 |

DOIs | |

State | Published - 2007 |

Event | KDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - San Jose, CA, United States Duration: Aug 12 2007 → Aug 15 2007 |

### Other

Other | KDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
---|---|

Country | United States |

City | San Jose, CA |

Period | 8/12/07 → 8/15/07 |

### Keywords

- Energy-based models
- Expectation maximization
- Latent manifold models
- Structured prediction

### ASJC Scopus subject areas

- Information Systems

### Cite this

*KDD-2007: Proceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*(pp. 173-182) https://doi.org/10.1145/1281192.1281214

**Discovering the hidden structure of house prices with a non-parametric latent manifold model.** / Chopra, Sumit; Thampy, Trivikraman; Leahy, John; Caplin, Andrew; LeCun, Yann.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*KDD-2007: Proceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.*pp. 173-182, KDD-2007: 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA, United States, 8/12/07. https://doi.org/10.1145/1281192.1281214

}

TY - GEN

T1 - Discovering the hidden structure of house prices with a non-parametric latent manifold model

AU - Chopra, Sumit

AU - Thampy, Trivikraman

AU - Leahy, John

AU - Caplin, Andrew

AU - LeCun, Yann

PY - 2007

Y1 - 2007

N2 - In many regression problems, the variable to be predicted depends not only on a sample-specific feature vector, but also on an unknown (latent) manifold that must satisfy known constraints. An example is house prices, which depend on the characteristics of the house, and on the desirability of the neighborhood, which is not directly measurable. The proposed method comprises two trainable components. The first one is a parametric model that predicts the "intrinsic" price of the house from its description. The second one is a smooth, non-parametric model of the latent "desirability" manifold. The predicted price of a house is the product of its intrinsic price and desirability. The two components are trained simultaneously using a deterministic form of the EM algorithm. The model was trained on a large dataset of houses from Los Angeles county. It produces better predictions than pure parametric and non-parametric models. It also produces useful estimates of the desirability surface at each location.

AB - In many regression problems, the variable to be predicted depends not only on a sample-specific feature vector, but also on an unknown (latent) manifold that must satisfy known constraints. An example is house prices, which depend on the characteristics of the house, and on the desirability of the neighborhood, which is not directly measurable. The proposed method comprises two trainable components. The first one is a parametric model that predicts the "intrinsic" price of the house from its description. The second one is a smooth, non-parametric model of the latent "desirability" manifold. The predicted price of a house is the product of its intrinsic price and desirability. The two components are trained simultaneously using a deterministic form of the EM algorithm. The model was trained on a large dataset of houses from Los Angeles county. It produces better predictions than pure parametric and non-parametric models. It also produces useful estimates of the desirability surface at each location.

KW - Energy-based models

KW - Expectation maximization

KW - Latent manifold models

KW - Structured prediction

UR - http://www.scopus.com/inward/record.url?scp=36849089102&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=36849089102&partnerID=8YFLogxK

U2 - 10.1145/1281192.1281214

DO - 10.1145/1281192.1281214

M3 - Conference contribution

SN - 1595936092

SN - 9781595936097

SP - 173

EP - 182

BT - KDD-2007: Proceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

ER -