A data-driven model for spectra: Finding double redshifts in the Sloan Digital Sky Survey

P. Tsalmantza, David W. Hogg

    Research output: Contribution to journalArticle

    Abstract

    We present a data-driven method - heteroscedastic matrix factorization, a kind of probabilistic factor analysis - for modeling or performing dimensionality reduction on observed spectra or other high-dimensional data with known but non-uniform observational uncertainties. The method uses an iterative inverse-variance-weighted least-squares minimization procedure to generate a best set of basis functions. The method is similar to principal components analysis (PCA), but with the substantial advantage that it uses measurement uncertainties in a responsible way and accounts naturally for poorly measured and missing data; it models the variance in the noise-deconvolved data space. A regularization can be applied, in the form of a smoothness prior (inspired by Gaussian processes) or a non-negative constraint, without making the method prohibitively slow. Because the method optimizes a justified scalar (related to the likelihood), the basis provides a better fit to the data in a probabilistic sense than any PCA basis. We test the method on Sloan Digital Sky Survey (SDSS) spectra, concentrating on spectra known to contain two redshift components: these are spectra of gravitational lens candidates and massive black hole binaries. We apply a hypothesis test to compare one-redshift and two-redshift models for these spectra, utilizing the data-driven model trained on a random subset of all SDSS spectra. This test confirms 129 of the 131 lens candidates in our sample and all of the known binary candidates, and turns up very few false positives.

    Original languageEnglish (US)
    Article number122
    JournalAstrophysical Journal
    Volume753
    Issue number2
    DOIs
    StatePublished - Jul 10 2012

    Fingerprint

    principal components analysis
    principal component analysis
    gravitational lenses
    factor analysis
    concentrating
    factorization
    matrix methods
    set theory
    method
    lenses
    scalars
    optimization
    matrix
    modeling
    test

    Keywords

    • black hole physics
    • cosmology: observations
    • gravitational lensing: strong
    • methods: data analysis
    • methods: statistical
    • techniques: spectroscopic

    ASJC Scopus subject areas

    • Space and Planetary Science
    • Astronomy and Astrophysics

    Cite this

    A data-driven model for spectra : Finding double redshifts in the Sloan Digital Sky Survey. / Tsalmantza, P.; Hogg, David W.

    In: Astrophysical Journal, Vol. 753, No. 2, 122, 10.07.2012.

    Research output: Contribution to journalArticle

    @article{12240f7e0ef54d8a82e8e550a87f97f6,
    title = "A data-driven model for spectra: Finding double redshifts in the Sloan Digital Sky Survey",
    abstract = "We present a data-driven method - heteroscedastic matrix factorization, a kind of probabilistic factor analysis - for modeling or performing dimensionality reduction on observed spectra or other high-dimensional data with known but non-uniform observational uncertainties. The method uses an iterative inverse-variance-weighted least-squares minimization procedure to generate a best set of basis functions. The method is similar to principal components analysis (PCA), but with the substantial advantage that it uses measurement uncertainties in a responsible way and accounts naturally for poorly measured and missing data; it models the variance in the noise-deconvolved data space. A regularization can be applied, in the form of a smoothness prior (inspired by Gaussian processes) or a non-negative constraint, without making the method prohibitively slow. Because the method optimizes a justified scalar (related to the likelihood), the basis provides a better fit to the data in a probabilistic sense than any PCA basis. We test the method on Sloan Digital Sky Survey (SDSS) spectra, concentrating on spectra known to contain two redshift components: these are spectra of gravitational lens candidates and massive black hole binaries. We apply a hypothesis test to compare one-redshift and two-redshift models for these spectra, utilizing the data-driven model trained on a random subset of all SDSS spectra. This test confirms 129 of the 131 lens candidates in our sample and all of the known binary candidates, and turns up very few false positives.",
    keywords = "black hole physics, cosmology: observations, gravitational lensing: strong, methods: data analysis, methods: statistical, techniques: spectroscopic",
    author = "P. Tsalmantza and Hogg, {David W.}",
    year = "2012",
    month = "7",
    day = "10",
    doi = "10.1088/0004-637X/753/2/122",
    language = "English (US)",
    volume = "753",
    journal = "Astrophysical Journal",
    issn = "0004-637X",
    publisher = "IOP Publishing Ltd.",
    number = "2",

    }

    TY - JOUR

    T1 - A data-driven model for spectra

    T2 - Finding double redshifts in the Sloan Digital Sky Survey

    AU - Tsalmantza, P.

    AU - Hogg, David W.

    PY - 2012/7/10

    Y1 - 2012/7/10

    N2 - We present a data-driven method - heteroscedastic matrix factorization, a kind of probabilistic factor analysis - for modeling or performing dimensionality reduction on observed spectra or other high-dimensional data with known but non-uniform observational uncertainties. The method uses an iterative inverse-variance-weighted least-squares minimization procedure to generate a best set of basis functions. The method is similar to principal components analysis (PCA), but with the substantial advantage that it uses measurement uncertainties in a responsible way and accounts naturally for poorly measured and missing data; it models the variance in the noise-deconvolved data space. A regularization can be applied, in the form of a smoothness prior (inspired by Gaussian processes) or a non-negative constraint, without making the method prohibitively slow. Because the method optimizes a justified scalar (related to the likelihood), the basis provides a better fit to the data in a probabilistic sense than any PCA basis. We test the method on Sloan Digital Sky Survey (SDSS) spectra, concentrating on spectra known to contain two redshift components: these are spectra of gravitational lens candidates and massive black hole binaries. We apply a hypothesis test to compare one-redshift and two-redshift models for these spectra, utilizing the data-driven model trained on a random subset of all SDSS spectra. This test confirms 129 of the 131 lens candidates in our sample and all of the known binary candidates, and turns up very few false positives.

    AB - We present a data-driven method - heteroscedastic matrix factorization, a kind of probabilistic factor analysis - for modeling or performing dimensionality reduction on observed spectra or other high-dimensional data with known but non-uniform observational uncertainties. The method uses an iterative inverse-variance-weighted least-squares minimization procedure to generate a best set of basis functions. The method is similar to principal components analysis (PCA), but with the substantial advantage that it uses measurement uncertainties in a responsible way and accounts naturally for poorly measured and missing data; it models the variance in the noise-deconvolved data space. A regularization can be applied, in the form of a smoothness prior (inspired by Gaussian processes) or a non-negative constraint, without making the method prohibitively slow. Because the method optimizes a justified scalar (related to the likelihood), the basis provides a better fit to the data in a probabilistic sense than any PCA basis. We test the method on Sloan Digital Sky Survey (SDSS) spectra, concentrating on spectra known to contain two redshift components: these are spectra of gravitational lens candidates and massive black hole binaries. We apply a hypothesis test to compare one-redshift and two-redshift models for these spectra, utilizing the data-driven model trained on a random subset of all SDSS spectra. This test confirms 129 of the 131 lens candidates in our sample and all of the known binary candidates, and turns up very few false positives.

    KW - black hole physics

    KW - cosmology: observations

    KW - gravitational lensing: strong

    KW - methods: data analysis

    KW - methods: statistical

    KW - techniques: spectroscopic

    UR - http://www.scopus.com/inward/record.url?scp=84862878902&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84862878902&partnerID=8YFLogxK

    U2 - 10.1088/0004-637X/753/2/122

    DO - 10.1088/0004-637X/753/2/122

    M3 - Article

    VL - 753

    JO - Astrophysical Journal

    JF - Astrophysical Journal

    SN - 0004-637X

    IS - 2

    M1 - 122

    ER -