Data-driven, Interpretable Photometric Redshifts Trained on Heterogeneous and Unrepresentative Data

Boris Leistedt, David W. Hogg

    Research output: Contribution to journalArticle

    Abstract

    We present a new method for inferring photometric redshifts in deep galaxy and quasar surveys, based on a data-driven model of latent spectral energy distributions (SEDs) and a physical model of photometric fluxes as a function of redshift. This conceptually novel approach combines the advantages of both machine learning methods and template fitting methods by building template SEDs directly from the spectroscopic training data. This is made computationally tractable with Gaussian processes operating in flux-redshift space, encoding the physics of redshifts and the projection of galaxy SEDs onto photometric bandpasses. This method alleviates the need to acquire representative training data or to construct detailed galaxy SED models; it requires only that the photometric bandpasses and calibrations be known or have parameterized unknowns. The training data can consist of a combination of spectroscopic and deep many-band photometric data with reliable redshifts, which do not need to entirely spatially overlap with the target survey of interest or even involve the same photometric bands. We showcase the method on the i-magnitude-selected, spectroscopically confirmed galaxies in the COSMOS field. The model is trained on the deepest bands (from SUBARU and HST) and photometric redshifts are derived using the shallower SDSS optical bands only. We demonstrate that we obtain accurate redshift point estimates and probability distributions despite the training and target sets having very different redshift distributions, noise properties, and even photometric bands. Our model can also be used to predict missing photometric fluxes or to simulate populations of galaxies with realistic fluxes and redshifts, for example.

    Original languageEnglish (US)
    Article number5
    JournalAstrophysical Journal
    Volume838
    Issue number1
    DOIs
    StatePublished - Mar 20 2017

    Fingerprint

    spectral energy distribution
    galaxies
    education
    templates
    energy
    machine learning
    quasars
    coding
    projection
    distribution
    physics
    method
    calibration
    estimates

    Keywords

    • galaxies: distances and redshifts
    • large-scale structure of universe

    ASJC Scopus subject areas

    • Astronomy and Astrophysics
    • Space and Planetary Science

    Cite this

    Data-driven, Interpretable Photometric Redshifts Trained on Heterogeneous and Unrepresentative Data. / Leistedt, Boris; Hogg, David W.

    In: Astrophysical Journal, Vol. 838, No. 1, 5, 20.03.2017.

    Research output: Contribution to journalArticle

    @article{642ae74b78874c5c8139f2c01ba1512b,
    title = "Data-driven, Interpretable Photometric Redshifts Trained on Heterogeneous and Unrepresentative Data",
    abstract = "We present a new method for inferring photometric redshifts in deep galaxy and quasar surveys, based on a data-driven model of latent spectral energy distributions (SEDs) and a physical model of photometric fluxes as a function of redshift. This conceptually novel approach combines the advantages of both machine learning methods and template fitting methods by building template SEDs directly from the spectroscopic training data. This is made computationally tractable with Gaussian processes operating in flux-redshift space, encoding the physics of redshifts and the projection of galaxy SEDs onto photometric bandpasses. This method alleviates the need to acquire representative training data or to construct detailed galaxy SED models; it requires only that the photometric bandpasses and calibrations be known or have parameterized unknowns. The training data can consist of a combination of spectroscopic and deep many-band photometric data with reliable redshifts, which do not need to entirely spatially overlap with the target survey of interest or even involve the same photometric bands. We showcase the method on the i-magnitude-selected, spectroscopically confirmed galaxies in the COSMOS field. The model is trained on the deepest bands (from SUBARU and HST) and photometric redshifts are derived using the shallower SDSS optical bands only. We demonstrate that we obtain accurate redshift point estimates and probability distributions despite the training and target sets having very different redshift distributions, noise properties, and even photometric bands. Our model can also be used to predict missing photometric fluxes or to simulate populations of galaxies with realistic fluxes and redshifts, for example.",
    keywords = "galaxies: distances and redshifts, large-scale structure of universe",
    author = "Boris Leistedt and Hogg, {David W.}",
    year = "2017",
    month = "3",
    day = "20",
    doi = "10.3847/1538-4357/aa6332",
    language = "English (US)",
    volume = "838",
    journal = "Astrophysical Journal",
    issn = "0004-637X",
    publisher = "IOP Publishing Ltd.",
    number = "1",

    }

    TY - JOUR

    T1 - Data-driven, Interpretable Photometric Redshifts Trained on Heterogeneous and Unrepresentative Data

    AU - Leistedt, Boris

    AU - Hogg, David W.

    PY - 2017/3/20

    Y1 - 2017/3/20

    N2 - We present a new method for inferring photometric redshifts in deep galaxy and quasar surveys, based on a data-driven model of latent spectral energy distributions (SEDs) and a physical model of photometric fluxes as a function of redshift. This conceptually novel approach combines the advantages of both machine learning methods and template fitting methods by building template SEDs directly from the spectroscopic training data. This is made computationally tractable with Gaussian processes operating in flux-redshift space, encoding the physics of redshifts and the projection of galaxy SEDs onto photometric bandpasses. This method alleviates the need to acquire representative training data or to construct detailed galaxy SED models; it requires only that the photometric bandpasses and calibrations be known or have parameterized unknowns. The training data can consist of a combination of spectroscopic and deep many-band photometric data with reliable redshifts, which do not need to entirely spatially overlap with the target survey of interest or even involve the same photometric bands. We showcase the method on the i-magnitude-selected, spectroscopically confirmed galaxies in the COSMOS field. The model is trained on the deepest bands (from SUBARU and HST) and photometric redshifts are derived using the shallower SDSS optical bands only. We demonstrate that we obtain accurate redshift point estimates and probability distributions despite the training and target sets having very different redshift distributions, noise properties, and even photometric bands. Our model can also be used to predict missing photometric fluxes or to simulate populations of galaxies with realistic fluxes and redshifts, for example.

    AB - We present a new method for inferring photometric redshifts in deep galaxy and quasar surveys, based on a data-driven model of latent spectral energy distributions (SEDs) and a physical model of photometric fluxes as a function of redshift. This conceptually novel approach combines the advantages of both machine learning methods and template fitting methods by building template SEDs directly from the spectroscopic training data. This is made computationally tractable with Gaussian processes operating in flux-redshift space, encoding the physics of redshifts and the projection of galaxy SEDs onto photometric bandpasses. This method alleviates the need to acquire representative training data or to construct detailed galaxy SED models; it requires only that the photometric bandpasses and calibrations be known or have parameterized unknowns. The training data can consist of a combination of spectroscopic and deep many-band photometric data with reliable redshifts, which do not need to entirely spatially overlap with the target survey of interest or even involve the same photometric bands. We showcase the method on the i-magnitude-selected, spectroscopically confirmed galaxies in the COSMOS field. The model is trained on the deepest bands (from SUBARU and HST) and photometric redshifts are derived using the shallower SDSS optical bands only. We demonstrate that we obtain accurate redshift point estimates and probability distributions despite the training and target sets having very different redshift distributions, noise properties, and even photometric bands. Our model can also be used to predict missing photometric fluxes or to simulate populations of galaxies with realistic fluxes and redshifts, for example.

    KW - galaxies: distances and redshifts

    KW - large-scale structure of universe

    UR - http://www.scopus.com/inward/record.url?scp=85016792867&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85016792867&partnerID=8YFLogxK

    U2 - 10.3847/1538-4357/aa6332

    DO - 10.3847/1538-4357/aa6332

    M3 - Article

    VL - 838

    JO - Astrophysical Journal

    JF - Astrophysical Journal

    SN - 0004-637X

    IS - 1

    M1 - 5

    ER -