Exponent monitoring for low-cost concurrent error detection in FPU control logic

Mihalis Maniatakos, Yiorgos Makris, Prabhakar Kudva, Bruce Fleischer

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    We present a non-intrusive concurrent error detection (CED) method for protecting the control logic of a contemporary floating point unit (FPU). The proposed method is based on the observation that control logic errors lead to extensive datapath corruption and affect, with high probability, the exponent part of the IEEE 754 floating point representation. Thus, exponent monitoring can be utilized to detect errors in the control logic of the FPU. Predicting the exponent involves relatively simple operations, therefore our method incurs significantly lower overhead than the classical approach of duplicating the control logic of the FPU. Indeed, experimental results on the openSPARC T1 processor show that, as compared to control logic duplication, which incurs an area overhead of 17.9% of the FPU size, our method incurs an area overhead of only 5.8% yet still achieves detection of over 95% of transient errors in the FPU control logic. Moreover, the proposed method offers the ancillary benefit of also detecting 98.1% of datapath errors that affect the exponent, which cannot be detected via duplication of control logic. Finally, when combined with a classical residue code-based method for the fraction, our method leads to a complete CED solution for the entire FPU which provides a coverage of 94.4% of all errors at an area cost of 16.32% of the FPU size.

    Original languageEnglish (US)
    Title of host publicationProceedings - 2011 29th IEEE VLSI Test Symposium, VTS 2011
    Pages235-240
    Number of pages6
    DOIs
    StatePublished - Jul 1 2011
    Event2011 29th IEEE VLSI Test Symposium, VTS 2011 - Dana Point, CA, United States
    Duration: May 1 2011May 5 2011

    Other

    Other2011 29th IEEE VLSI Test Symposium, VTS 2011
    CountryUnited States
    CityDana Point, CA
    Period5/1/115/5/11

    Fingerprint

    Error detection
    Monitoring
    Costs

    ASJC Scopus subject areas

    • Computer Science Applications
    • Electrical and Electronic Engineering

    Cite this

    Maniatakos, M., Makris, Y., Kudva, P., & Fleischer, B. (2011). Exponent monitoring for low-cost concurrent error detection in FPU control logic. In Proceedings - 2011 29th IEEE VLSI Test Symposium, VTS 2011 (pp. 235-240). [5783727] https://doi.org/10.1109/VTS.2011.5783727

    Exponent monitoring for low-cost concurrent error detection in FPU control logic. / Maniatakos, Mihalis; Makris, Yiorgos; Kudva, Prabhakar; Fleischer, Bruce.

    Proceedings - 2011 29th IEEE VLSI Test Symposium, VTS 2011. 2011. p. 235-240 5783727.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Maniatakos, M, Makris, Y, Kudva, P & Fleischer, B 2011, Exponent monitoring for low-cost concurrent error detection in FPU control logic. in Proceedings - 2011 29th IEEE VLSI Test Symposium, VTS 2011., 5783727, pp. 235-240, 2011 29th IEEE VLSI Test Symposium, VTS 2011, Dana Point, CA, United States, 5/1/11. https://doi.org/10.1109/VTS.2011.5783727
    Maniatakos M, Makris Y, Kudva P, Fleischer B. Exponent monitoring for low-cost concurrent error detection in FPU control logic. In Proceedings - 2011 29th IEEE VLSI Test Symposium, VTS 2011. 2011. p. 235-240. 5783727 https://doi.org/10.1109/VTS.2011.5783727
    Maniatakos, Mihalis ; Makris, Yiorgos ; Kudva, Prabhakar ; Fleischer, Bruce. / Exponent monitoring for low-cost concurrent error detection in FPU control logic. Proceedings - 2011 29th IEEE VLSI Test Symposium, VTS 2011. 2011. pp. 235-240
    @inproceedings{ccd00f64accb4baea44f3e5599c91c0b,
    title = "Exponent monitoring for low-cost concurrent error detection in FPU control logic",
    abstract = "We present a non-intrusive concurrent error detection (CED) method for protecting the control logic of a contemporary floating point unit (FPU). The proposed method is based on the observation that control logic errors lead to extensive datapath corruption and affect, with high probability, the exponent part of the IEEE 754 floating point representation. Thus, exponent monitoring can be utilized to detect errors in the control logic of the FPU. Predicting the exponent involves relatively simple operations, therefore our method incurs significantly lower overhead than the classical approach of duplicating the control logic of the FPU. Indeed, experimental results on the openSPARC T1 processor show that, as compared to control logic duplication, which incurs an area overhead of 17.9{\%} of the FPU size, our method incurs an area overhead of only 5.8{\%} yet still achieves detection of over 95{\%} of transient errors in the FPU control logic. Moreover, the proposed method offers the ancillary benefit of also detecting 98.1{\%} of datapath errors that affect the exponent, which cannot be detected via duplication of control logic. Finally, when combined with a classical residue code-based method for the fraction, our method leads to a complete CED solution for the entire FPU which provides a coverage of 94.4{\%} of all errors at an area cost of 16.32{\%} of the FPU size.",
    author = "Mihalis Maniatakos and Yiorgos Makris and Prabhakar Kudva and Bruce Fleischer",
    year = "2011",
    month = "7",
    day = "1",
    doi = "10.1109/VTS.2011.5783727",
    language = "English (US)",
    isbn = "9781612846552",
    pages = "235--240",
    booktitle = "Proceedings - 2011 29th IEEE VLSI Test Symposium, VTS 2011",

    }

    TY - GEN

    T1 - Exponent monitoring for low-cost concurrent error detection in FPU control logic

    AU - Maniatakos, Mihalis

    AU - Makris, Yiorgos

    AU - Kudva, Prabhakar

    AU - Fleischer, Bruce

    PY - 2011/7/1

    Y1 - 2011/7/1

    N2 - We present a non-intrusive concurrent error detection (CED) method for protecting the control logic of a contemporary floating point unit (FPU). The proposed method is based on the observation that control logic errors lead to extensive datapath corruption and affect, with high probability, the exponent part of the IEEE 754 floating point representation. Thus, exponent monitoring can be utilized to detect errors in the control logic of the FPU. Predicting the exponent involves relatively simple operations, therefore our method incurs significantly lower overhead than the classical approach of duplicating the control logic of the FPU. Indeed, experimental results on the openSPARC T1 processor show that, as compared to control logic duplication, which incurs an area overhead of 17.9% of the FPU size, our method incurs an area overhead of only 5.8% yet still achieves detection of over 95% of transient errors in the FPU control logic. Moreover, the proposed method offers the ancillary benefit of also detecting 98.1% of datapath errors that affect the exponent, which cannot be detected via duplication of control logic. Finally, when combined with a classical residue code-based method for the fraction, our method leads to a complete CED solution for the entire FPU which provides a coverage of 94.4% of all errors at an area cost of 16.32% of the FPU size.

    AB - We present a non-intrusive concurrent error detection (CED) method for protecting the control logic of a contemporary floating point unit (FPU). The proposed method is based on the observation that control logic errors lead to extensive datapath corruption and affect, with high probability, the exponent part of the IEEE 754 floating point representation. Thus, exponent monitoring can be utilized to detect errors in the control logic of the FPU. Predicting the exponent involves relatively simple operations, therefore our method incurs significantly lower overhead than the classical approach of duplicating the control logic of the FPU. Indeed, experimental results on the openSPARC T1 processor show that, as compared to control logic duplication, which incurs an area overhead of 17.9% of the FPU size, our method incurs an area overhead of only 5.8% yet still achieves detection of over 95% of transient errors in the FPU control logic. Moreover, the proposed method offers the ancillary benefit of also detecting 98.1% of datapath errors that affect the exponent, which cannot be detected via duplication of control logic. Finally, when combined with a classical residue code-based method for the fraction, our method leads to a complete CED solution for the entire FPU which provides a coverage of 94.4% of all errors at an area cost of 16.32% of the FPU size.

    UR - http://www.scopus.com/inward/record.url?scp=79959652695&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=79959652695&partnerID=8YFLogxK

    U2 - 10.1109/VTS.2011.5783727

    DO - 10.1109/VTS.2011.5783727

    M3 - Conference contribution

    SN - 9781612846552

    SP - 235

    EP - 240

    BT - Proceedings - 2011 29th IEEE VLSI Test Symposium, VTS 2011

    ER -