Use it or lose it: Wear-out and lifetime in future chip multiprocessors

Hyungjun Kim, Arseniy Vitkovskiy, Paul V. Gratz, Vassos Soteriou Soteriou

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Moore's Law scaling is continuing to yield even higher transistor density with each succeeding process generation, leading to today's multi-core Chip Multi-Processors (CMPs) with tens or even hundreds of interconnected cores or tiles. Unfortunately, deep sub-micron CMOS process technology is marred by increasing susceptibility to wearout. Prolonged operational stress gives rise to accelerated wearout and failure, due to several physical failure mechanisms, including Hot Carrier Injection (HCI) and Negative Bias Temperature Instability (NBTI). Each failure mechanism correlates with different usage-based stresses, all of which can eventually generate permanent faults. While the wearout of an individual core in many-core CMPs may not necessarily be catastrophic for the system, a single fault in the inter-processor Network-on-Chip (NoC) fabric could render the entire chip useless, as it could lead to protocol-level deadlocks, or even partition away vital components such as the memory controller or other critical I/O. In this paper, we develop critical path models for HCI- and NBTI-induced wear due to the actual stresses caused by real workloads, applied onto the interconnect microarchitecture. A key finding from this modeling being that, counter to prevailing wisdom, wearout in the CMP on-chip interconnect is correlated with lack of load observed in the NoC routers, rather than high load. We then develop a novel wearout-decelerating scheme in which routers under low load have their wearout-sensitive components exercised, without significantly impacting cycle time, pipeline depth, area or power consumption of the overall router. We subsequently show that the proposed design yields a 13.8x-65x increase in CMP lifetime.

Original languageEnglish (US)
Title of host publicationMICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Pages136-147
Number of pages12
DOIs
StatePublished - Dec 1 2013
Event46th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2013 - Davis, CA, United States
Duration: Dec 7 2013Dec 11 2013

Other

Other46th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2013
CountryUnited States
CityDavis, CA
Period12/7/1312/11/13

Fingerprint

Routers
Hot carriers
Wear of materials
Scaling laws
Tile
Transistors
Electric power utilization
Pipelines
Data storage equipment
Controllers
Network-on-chip
Negative bias temperature instability

Keywords

  • hot carrier injection (HCI)
  • lifetime
  • negative bias temperature instability (NBTI)
  • network-on-chip
  • reliability
  • wearout

ASJC Scopus subject areas

  • Hardware and Architecture
  • Electrical and Electronic Engineering

Cite this

Kim, H., Vitkovskiy, A., Gratz, P. V., & Soteriou, V. S. (2013). Use it or lose it: Wear-out and lifetime in future chip multiprocessors. In MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (pp. 136-147) https://doi.org/10.1145/2540708.2540721

Use it or lose it : Wear-out and lifetime in future chip multiprocessors. / Kim, Hyungjun; Vitkovskiy, Arseniy; Gratz, Paul V.; Soteriou, Vassos Soteriou.

MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. 2013. p. 136-147.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Kim, H, Vitkovskiy, A, Gratz, PV & Soteriou, VS 2013, Use it or lose it: Wear-out and lifetime in future chip multiprocessors. in MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. pp. 136-147, 46th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2013, Davis, CA, United States, 12/7/13. https://doi.org/10.1145/2540708.2540721
Kim H, Vitkovskiy A, Gratz PV, Soteriou VS. Use it or lose it: Wear-out and lifetime in future chip multiprocessors. In MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. 2013. p. 136-147 https://doi.org/10.1145/2540708.2540721
Kim, Hyungjun ; Vitkovskiy, Arseniy ; Gratz, Paul V. ; Soteriou, Vassos Soteriou. / Use it or lose it : Wear-out and lifetime in future chip multiprocessors. MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture. 2013. pp. 136-147
@inproceedings{f2da37c3c40a4fc48a8fba397f82cf4e,
title = "Use it or lose it: Wear-out and lifetime in future chip multiprocessors",
abstract = "Moore's Law scaling is continuing to yield even higher transistor density with each succeeding process generation, leading to today's multi-core Chip Multi-Processors (CMPs) with tens or even hundreds of interconnected cores or tiles. Unfortunately, deep sub-micron CMOS process technology is marred by increasing susceptibility to wearout. Prolonged operational stress gives rise to accelerated wearout and failure, due to several physical failure mechanisms, including Hot Carrier Injection (HCI) and Negative Bias Temperature Instability (NBTI). Each failure mechanism correlates with different usage-based stresses, all of which can eventually generate permanent faults. While the wearout of an individual core in many-core CMPs may not necessarily be catastrophic for the system, a single fault in the inter-processor Network-on-Chip (NoC) fabric could render the entire chip useless, as it could lead to protocol-level deadlocks, or even partition away vital components such as the memory controller or other critical I/O. In this paper, we develop critical path models for HCI- and NBTI-induced wear due to the actual stresses caused by real workloads, applied onto the interconnect microarchitecture. A key finding from this modeling being that, counter to prevailing wisdom, wearout in the CMP on-chip interconnect is correlated with lack of load observed in the NoC routers, rather than high load. We then develop a novel wearout-decelerating scheme in which routers under low load have their wearout-sensitive components exercised, without significantly impacting cycle time, pipeline depth, area or power consumption of the overall router. We subsequently show that the proposed design yields a 13.8x-65x increase in CMP lifetime.",
keywords = "hot carrier injection (HCI), lifetime, negative bias temperature instability (NBTI), network-on-chip, reliability, wearout",
author = "Hyungjun Kim and Arseniy Vitkovskiy and Gratz, {Paul V.} and Soteriou, {Vassos Soteriou}",
year = "2013",
month = "12",
day = "1",
doi = "10.1145/2540708.2540721",
language = "English (US)",
isbn = "9781450326384",
pages = "136--147",
booktitle = "MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture",

}

TY - GEN

T1 - Use it or lose it

T2 - Wear-out and lifetime in future chip multiprocessors

AU - Kim, Hyungjun

AU - Vitkovskiy, Arseniy

AU - Gratz, Paul V.

AU - Soteriou, Vassos Soteriou

PY - 2013/12/1

Y1 - 2013/12/1

N2 - Moore's Law scaling is continuing to yield even higher transistor density with each succeeding process generation, leading to today's multi-core Chip Multi-Processors (CMPs) with tens or even hundreds of interconnected cores or tiles. Unfortunately, deep sub-micron CMOS process technology is marred by increasing susceptibility to wearout. Prolonged operational stress gives rise to accelerated wearout and failure, due to several physical failure mechanisms, including Hot Carrier Injection (HCI) and Negative Bias Temperature Instability (NBTI). Each failure mechanism correlates with different usage-based stresses, all of which can eventually generate permanent faults. While the wearout of an individual core in many-core CMPs may not necessarily be catastrophic for the system, a single fault in the inter-processor Network-on-Chip (NoC) fabric could render the entire chip useless, as it could lead to protocol-level deadlocks, or even partition away vital components such as the memory controller or other critical I/O. In this paper, we develop critical path models for HCI- and NBTI-induced wear due to the actual stresses caused by real workloads, applied onto the interconnect microarchitecture. A key finding from this modeling being that, counter to prevailing wisdom, wearout in the CMP on-chip interconnect is correlated with lack of load observed in the NoC routers, rather than high load. We then develop a novel wearout-decelerating scheme in which routers under low load have their wearout-sensitive components exercised, without significantly impacting cycle time, pipeline depth, area or power consumption of the overall router. We subsequently show that the proposed design yields a 13.8x-65x increase in CMP lifetime.

AB - Moore's Law scaling is continuing to yield even higher transistor density with each succeeding process generation, leading to today's multi-core Chip Multi-Processors (CMPs) with tens or even hundreds of interconnected cores or tiles. Unfortunately, deep sub-micron CMOS process technology is marred by increasing susceptibility to wearout. Prolonged operational stress gives rise to accelerated wearout and failure, due to several physical failure mechanisms, including Hot Carrier Injection (HCI) and Negative Bias Temperature Instability (NBTI). Each failure mechanism correlates with different usage-based stresses, all of which can eventually generate permanent faults. While the wearout of an individual core in many-core CMPs may not necessarily be catastrophic for the system, a single fault in the inter-processor Network-on-Chip (NoC) fabric could render the entire chip useless, as it could lead to protocol-level deadlocks, or even partition away vital components such as the memory controller or other critical I/O. In this paper, we develop critical path models for HCI- and NBTI-induced wear due to the actual stresses caused by real workloads, applied onto the interconnect microarchitecture. A key finding from this modeling being that, counter to prevailing wisdom, wearout in the CMP on-chip interconnect is correlated with lack of load observed in the NoC routers, rather than high load. We then develop a novel wearout-decelerating scheme in which routers under low load have their wearout-sensitive components exercised, without significantly impacting cycle time, pipeline depth, area or power consumption of the overall router. We subsequently show that the proposed design yields a 13.8x-65x increase in CMP lifetime.

KW - hot carrier injection (HCI)

KW - lifetime

KW - negative bias temperature instability (NBTI)

KW - network-on-chip

KW - reliability

KW - wearout

UR - http://www.scopus.com/inward/record.url?scp=84892513006&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84892513006&partnerID=8YFLogxK

U2 - 10.1145/2540708.2540721

DO - 10.1145/2540708.2540721

M3 - Conference contribution

AN - SCOPUS:84892513006

SN - 9781450326384

SP - 136

EP - 147

BT - MICRO 2013 - Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

ER -