Clustering and classification through normalizing flows in feature space

J. P. Agnellit, M. Cadeiras, Esteban Tabak, C. V. Turnert, Eric Vanden Eijnden

Research output: Contribution to journalArticle

Abstract

A unified variational methodology is developed f or classification and clustering problems and is tested in the classification of tumors from gene expression data. It is based on fluid-like flows in feature space that cluster a set of observations by transforming them into likely samples from p isotropic Gaussians, where p is the number of classes sought. The methodology blurs the distinction between training and testing populations through the soft assignment of both to classes. The observations act as Lagrangian markers for the flows, comparatively active or passive depending on the current strength of the assignment to the corresponding class.

Original languageEnglish (US)
Pages (from-to)1784-1802
Number of pages19
JournalMultiscale Modeling and Simulation
Volume8
Issue number5
DOIs
StatePublished - 2010

Fingerprint

normalizing
Feature Space
Clustering
methodology
Assignment
Gene expression
tumor
gene expression
Tumors
Methodology
Gene Expression Data
markers
fluid flow
Fluids
fluid
Tumor
Testing
education
tumors
Likely

Keywords

  • Density estimation
  • Expectation maximization
  • Gaussianization
  • Inference
  • Machine learning
  • Maximum likelihood

ASJC Scopus subject areas

  • Modeling and Simulation
  • Chemistry(all)
  • Computer Science Applications
  • Ecological Modeling
  • Physics and Astronomy(all)

Cite this

Clustering and classification through normalizing flows in feature space. / Agnellit, J. P.; Cadeiras, M.; Tabak, Esteban; Turnert, C. V.; Vanden Eijnden, Eric.

In: Multiscale Modeling and Simulation, Vol. 8, No. 5, 2010, p. 1784-1802.

Research output: Contribution to journalArticle

@article{3283657eedf541b89dbc0d2635ce0d65,
title = "Clustering and classification through normalizing flows in feature space",
abstract = "A unified variational methodology is developed f or classification and clustering problems and is tested in the classification of tumors from gene expression data. It is based on fluid-like flows in feature space that cluster a set of observations by transforming them into likely samples from p isotropic Gaussians, where p is the number of classes sought. The methodology blurs the distinction between training and testing populations through the soft assignment of both to classes. The observations act as Lagrangian markers for the flows, comparatively active or passive depending on the current strength of the assignment to the corresponding class.",
keywords = "Density estimation, Expectation maximization, Gaussianization, Inference, Machine learning, Maximum likelihood",
author = "Agnellit, {J. P.} and M. Cadeiras and Esteban Tabak and Turnert, {C. V.} and {Vanden Eijnden}, Eric",
year = "2010",
doi = "10.1137/100783522",
language = "English (US)",
volume = "8",
pages = "1784--1802",
journal = "Multiscale Modeling and Simulation",
issn = "1540-3459",
publisher = "Society for Industrial and Applied Mathematics Publications",
number = "5",

}

TY - JOUR

T1 - Clustering and classification through normalizing flows in feature space

AU - Agnellit, J. P.

AU - Cadeiras, M.

AU - Tabak, Esteban

AU - Turnert, C. V.

AU - Vanden Eijnden, Eric

PY - 2010

Y1 - 2010

N2 - A unified variational methodology is developed f or classification and clustering problems and is tested in the classification of tumors from gene expression data. It is based on fluid-like flows in feature space that cluster a set of observations by transforming them into likely samples from p isotropic Gaussians, where p is the number of classes sought. The methodology blurs the distinction between training and testing populations through the soft assignment of both to classes. The observations act as Lagrangian markers for the flows, comparatively active or passive depending on the current strength of the assignment to the corresponding class.

AB - A unified variational methodology is developed f or classification and clustering problems and is tested in the classification of tumors from gene expression data. It is based on fluid-like flows in feature space that cluster a set of observations by transforming them into likely samples from p isotropic Gaussians, where p is the number of classes sought. The methodology blurs the distinction between training and testing populations through the soft assignment of both to classes. The observations act as Lagrangian markers for the flows, comparatively active or passive depending on the current strength of the assignment to the corresponding class.

KW - Density estimation

KW - Expectation maximization

KW - Gaussianization

KW - Inference

KW - Machine learning

KW - Maximum likelihood

UR - http://www.scopus.com/inward/record.url?scp=79251505272&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79251505272&partnerID=8YFLogxK

U2 - 10.1137/100783522

DO - 10.1137/100783522

M3 - Article

VL - 8

SP - 1784

EP - 1802

JO - Multiscale Modeling and Simulation

JF - Multiscale Modeling and Simulation

SN - 1540-3459

IS - 5

ER -