INFUSE: Interactive feature selection for predictive modeling of high dimensional data

Josua Krause, Adam Perer, Enrico Bertini

    Research output: Contribution to journalArticle

    Abstract

    Predictive modeling techniques are increasingly being used by data scientists to understand the probability of predicted outcomes. However, for data that is high-dimensional, a critical step in predictive modeling is determining which features should be included in the models. Feature selection algorithms are often used to remove non-informative features from models. However, there are many different classes of feature selection algorithms. Deciding which one to use is problematic as the algorithmic output is often not amenable to user interpretation. This limits the ability for users to utilize their domain expertise during the modeling process. To improve on this limitation, we developed INFUSE, a novel visual analytics system designed to help analysts understand how predictive features are being ranked across feature selection algorithms, cross-validation folds, and classifiers. We demonstrate how our system can lead to important insights in a case study involving clinical researchers predicting patient outcomes from electronic medical records.

    Original languageEnglish (US)
    Article number6876047
    Pages (from-to)1614-1623
    Number of pages10
    JournalIEEE Transactions on Visualization and Computer Graphics
    Volume20
    Issue number12
    DOIs
    StatePublished - Dec 31 2014

    Fingerprint

    Feature extraction
    Electronic medical equipment
    Classifiers

    Keywords

    • classification
    • feature selection
    • high-dimensional data
    • Predictive modeling
    • visual analytics

    ASJC Scopus subject areas

    • Computer Graphics and Computer-Aided Design
    • Software
    • Computer Vision and Pattern Recognition
    • Signal Processing

    Cite this

    INFUSE : Interactive feature selection for predictive modeling of high dimensional data. / Krause, Josua; Perer, Adam; Bertini, Enrico.

    In: IEEE Transactions on Visualization and Computer Graphics, Vol. 20, No. 12, 6876047, 31.12.2014, p. 1614-1623.

    Research output: Contribution to journalArticle

    @article{ff3a939aaa7b473a97de060c2d935212,
    title = "INFUSE: Interactive feature selection for predictive modeling of high dimensional data",
    abstract = "Predictive modeling techniques are increasingly being used by data scientists to understand the probability of predicted outcomes. However, for data that is high-dimensional, a critical step in predictive modeling is determining which features should be included in the models. Feature selection algorithms are often used to remove non-informative features from models. However, there are many different classes of feature selection algorithms. Deciding which one to use is problematic as the algorithmic output is often not amenable to user interpretation. This limits the ability for users to utilize their domain expertise during the modeling process. To improve on this limitation, we developed INFUSE, a novel visual analytics system designed to help analysts understand how predictive features are being ranked across feature selection algorithms, cross-validation folds, and classifiers. We demonstrate how our system can lead to important insights in a case study involving clinical researchers predicting patient outcomes from electronic medical records.",
    keywords = "classification, feature selection, high-dimensional data, Predictive modeling, visual analytics",
    author = "Josua Krause and Adam Perer and Enrico Bertini",
    year = "2014",
    month = "12",
    day = "31",
    doi = "10.1109/TVCG.2014.2346482",
    language = "English (US)",
    volume = "20",
    pages = "1614--1623",
    journal = "IEEE Transactions on Visualization and Computer Graphics",
    issn = "1077-2626",
    publisher = "IEEE Computer Society",
    number = "12",

    }

    TY - JOUR

    T1 - INFUSE

    T2 - Interactive feature selection for predictive modeling of high dimensional data

    AU - Krause, Josua

    AU - Perer, Adam

    AU - Bertini, Enrico

    PY - 2014/12/31

    Y1 - 2014/12/31

    N2 - Predictive modeling techniques are increasingly being used by data scientists to understand the probability of predicted outcomes. However, for data that is high-dimensional, a critical step in predictive modeling is determining which features should be included in the models. Feature selection algorithms are often used to remove non-informative features from models. However, there are many different classes of feature selection algorithms. Deciding which one to use is problematic as the algorithmic output is often not amenable to user interpretation. This limits the ability for users to utilize their domain expertise during the modeling process. To improve on this limitation, we developed INFUSE, a novel visual analytics system designed to help analysts understand how predictive features are being ranked across feature selection algorithms, cross-validation folds, and classifiers. We demonstrate how our system can lead to important insights in a case study involving clinical researchers predicting patient outcomes from electronic medical records.

    AB - Predictive modeling techniques are increasingly being used by data scientists to understand the probability of predicted outcomes. However, for data that is high-dimensional, a critical step in predictive modeling is determining which features should be included in the models. Feature selection algorithms are often used to remove non-informative features from models. However, there are many different classes of feature selection algorithms. Deciding which one to use is problematic as the algorithmic output is often not amenable to user interpretation. This limits the ability for users to utilize their domain expertise during the modeling process. To improve on this limitation, we developed INFUSE, a novel visual analytics system designed to help analysts understand how predictive features are being ranked across feature selection algorithms, cross-validation folds, and classifiers. We demonstrate how our system can lead to important insights in a case study involving clinical researchers predicting patient outcomes from electronic medical records.

    KW - classification

    KW - feature selection

    KW - high-dimensional data

    KW - Predictive modeling

    KW - visual analytics

    UR - http://www.scopus.com/inward/record.url?scp=84910066488&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84910066488&partnerID=8YFLogxK

    U2 - 10.1109/TVCG.2014.2346482

    DO - 10.1109/TVCG.2014.2346482

    M3 - Article

    VL - 20

    SP - 1614

    EP - 1623

    JO - IEEE Transactions on Visualization and Computer Graphics

    JF - IEEE Transactions on Visualization and Computer Graphics

    SN - 1077-2626

    IS - 12

    M1 - 6876047

    ER -