Use it or lose it: Proactive, deterministic longevity in future chip multiprocessors

Hyungjun Kim, Siva Bhanu Krishna Boga, Arseniy Vitkovskiy, Stavros Hadjitheophanous, Paul V. Gratz, Vassos Soteriou Soteriou, Maria K. Michael

Research output: Contribution to journalArticle

Abstract

Moore's Law scaling continues to yield higher transistor density with each succeeding process generation, leading to today'smany-core chip multiprocessors (CMPs) with tens or even hundreds of interconnected cores or tiles. Unfortunately, deep submicron CMOS process technology is marred by increasing susceptibility to wear. Prolonged operational stress gives rise to accelerated wearout and failure due to several physical failure mechanisms, including hot-carrier injection (HCI) and negative-bias temperature instability (NBTI). Each failure mechanism correlates with different usage-based stresses, all of which can eventually generate permanent faults. While the wearout of an individual core in many-core CMPs may not necessarily be catastrophic, a single fault in the interprocessor network-on-chip (NoC) fabric could render the entire chip useless, as it could lead to protocol-level deadlocks, or even partition away vital components such as the memory controller or other critical I/O. In this article, we study HCI- and NBTI-induced wear due to actual stresses caused by real workloads, applied onto the interconnect microarchitecture and develop a critical path model for NBTI-induced wearout. A key finding of this modeling is that, counter to prevailing wisdom, wearout in the CMP's on-chip interconnect is correlated with lack of load observed in the NoC routers rather than high load. We then develop a novel wearout-decelerating scheme in which routers under low load have their wear-sensitive components exercised without significantly impacting cycle time, pipeline depth, area, or power consumption of the overall router. A novel deterministic approach is proposed for the generation of appropriate exercise-mode data, ensuring design parameter targets are met. We subsequently show that the proposed design yields an ∼2,300× decrease in the rate of wear.

Original languageEnglish (US)
Article numberA65
JournalACM Transactions on Design Automation of Electronic Systems
Volume20
Issue number4
DOIs
StatePublished - Jan 1 2015

Fingerprint

Routers
Wear of materials
Hot carriers
Scaling laws
Tile
Transistors
Electric power utilization
Pipelines
Data storage equipment
Controllers
Negative bias temperature instability
Network-on-chip

Keywords

  • Hot-carrier injection (HCI)
  • Lifetime
  • Negative-bias temperature instability (NBTI)
  • Network-on-chip
  • Reliability
  • Wearout

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Graphics and Computer-Aided Design
  • Electrical and Electronic Engineering

Cite this

Use it or lose it : Proactive, deterministic longevity in future chip multiprocessors. / Kim, Hyungjun; Boga, Siva Bhanu Krishna; Vitkovskiy, Arseniy; Hadjitheophanous, Stavros; Gratz, Paul V.; Soteriou, Vassos Soteriou; Michael, Maria K.

In: ACM Transactions on Design Automation of Electronic Systems, Vol. 20, No. 4, A65, 01.01.2015.

Research output: Contribution to journalArticle

Kim, Hyungjun ; Boga, Siva Bhanu Krishna ; Vitkovskiy, Arseniy ; Hadjitheophanous, Stavros ; Gratz, Paul V. ; Soteriou, Vassos Soteriou ; Michael, Maria K. / Use it or lose it : Proactive, deterministic longevity in future chip multiprocessors. In: ACM Transactions on Design Automation of Electronic Systems. 2015 ; Vol. 20, No. 4.
@article{d44d2e655b10402ba3236a661ae871d1,
title = "Use it or lose it: Proactive, deterministic longevity in future chip multiprocessors",
abstract = "Moore's Law scaling continues to yield higher transistor density with each succeeding process generation, leading to today'smany-core chip multiprocessors (CMPs) with tens or even hundreds of interconnected cores or tiles. Unfortunately, deep submicron CMOS process technology is marred by increasing susceptibility to wear. Prolonged operational stress gives rise to accelerated wearout and failure due to several physical failure mechanisms, including hot-carrier injection (HCI) and negative-bias temperature instability (NBTI). Each failure mechanism correlates with different usage-based stresses, all of which can eventually generate permanent faults. While the wearout of an individual core in many-core CMPs may not necessarily be catastrophic, a single fault in the interprocessor network-on-chip (NoC) fabric could render the entire chip useless, as it could lead to protocol-level deadlocks, or even partition away vital components such as the memory controller or other critical I/O. In this article, we study HCI- and NBTI-induced wear due to actual stresses caused by real workloads, applied onto the interconnect microarchitecture and develop a critical path model for NBTI-induced wearout. A key finding of this modeling is that, counter to prevailing wisdom, wearout in the CMP's on-chip interconnect is correlated with lack of load observed in the NoC routers rather than high load. We then develop a novel wearout-decelerating scheme in which routers under low load have their wear-sensitive components exercised without significantly impacting cycle time, pipeline depth, area, or power consumption of the overall router. A novel deterministic approach is proposed for the generation of appropriate exercise-mode data, ensuring design parameter targets are met. We subsequently show that the proposed design yields an ∼2,300× decrease in the rate of wear.",
keywords = "Hot-carrier injection (HCI), Lifetime, Negative-bias temperature instability (NBTI), Network-on-chip, Reliability, Wearout",
author = "Hyungjun Kim and Boga, {Siva Bhanu Krishna} and Arseniy Vitkovskiy and Stavros Hadjitheophanous and Gratz, {Paul V.} and Soteriou, {Vassos Soteriou} and Michael, {Maria K.}",
year = "2015",
month = "1",
day = "1",
doi = "10.1145/2770873",
language = "English (US)",
volume = "20",
journal = "ACM Transactions on Design Automation of Electronic Systems",
issn = "1084-4309",
publisher = "Association for Computing Machinery (ACM)",
number = "4",

}

TY - JOUR

T1 - Use it or lose it

T2 - Proactive, deterministic longevity in future chip multiprocessors

AU - Kim, Hyungjun

AU - Boga, Siva Bhanu Krishna

AU - Vitkovskiy, Arseniy

AU - Hadjitheophanous, Stavros

AU - Gratz, Paul V.

AU - Soteriou, Vassos Soteriou

AU - Michael, Maria K.

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Moore's Law scaling continues to yield higher transistor density with each succeeding process generation, leading to today'smany-core chip multiprocessors (CMPs) with tens or even hundreds of interconnected cores or tiles. Unfortunately, deep submicron CMOS process technology is marred by increasing susceptibility to wear. Prolonged operational stress gives rise to accelerated wearout and failure due to several physical failure mechanisms, including hot-carrier injection (HCI) and negative-bias temperature instability (NBTI). Each failure mechanism correlates with different usage-based stresses, all of which can eventually generate permanent faults. While the wearout of an individual core in many-core CMPs may not necessarily be catastrophic, a single fault in the interprocessor network-on-chip (NoC) fabric could render the entire chip useless, as it could lead to protocol-level deadlocks, or even partition away vital components such as the memory controller or other critical I/O. In this article, we study HCI- and NBTI-induced wear due to actual stresses caused by real workloads, applied onto the interconnect microarchitecture and develop a critical path model for NBTI-induced wearout. A key finding of this modeling is that, counter to prevailing wisdom, wearout in the CMP's on-chip interconnect is correlated with lack of load observed in the NoC routers rather than high load. We then develop a novel wearout-decelerating scheme in which routers under low load have their wear-sensitive components exercised without significantly impacting cycle time, pipeline depth, area, or power consumption of the overall router. A novel deterministic approach is proposed for the generation of appropriate exercise-mode data, ensuring design parameter targets are met. We subsequently show that the proposed design yields an ∼2,300× decrease in the rate of wear.

AB - Moore's Law scaling continues to yield higher transistor density with each succeeding process generation, leading to today'smany-core chip multiprocessors (CMPs) with tens or even hundreds of interconnected cores or tiles. Unfortunately, deep submicron CMOS process technology is marred by increasing susceptibility to wear. Prolonged operational stress gives rise to accelerated wearout and failure due to several physical failure mechanisms, including hot-carrier injection (HCI) and negative-bias temperature instability (NBTI). Each failure mechanism correlates with different usage-based stresses, all of which can eventually generate permanent faults. While the wearout of an individual core in many-core CMPs may not necessarily be catastrophic, a single fault in the interprocessor network-on-chip (NoC) fabric could render the entire chip useless, as it could lead to protocol-level deadlocks, or even partition away vital components such as the memory controller or other critical I/O. In this article, we study HCI- and NBTI-induced wear due to actual stresses caused by real workloads, applied onto the interconnect microarchitecture and develop a critical path model for NBTI-induced wearout. A key finding of this modeling is that, counter to prevailing wisdom, wearout in the CMP's on-chip interconnect is correlated with lack of load observed in the NoC routers rather than high load. We then develop a novel wearout-decelerating scheme in which routers under low load have their wear-sensitive components exercised without significantly impacting cycle time, pipeline depth, area, or power consumption of the overall router. A novel deterministic approach is proposed for the generation of appropriate exercise-mode data, ensuring design parameter targets are met. We subsequently show that the proposed design yields an ∼2,300× decrease in the rate of wear.

KW - Hot-carrier injection (HCI)

KW - Lifetime

KW - Negative-bias temperature instability (NBTI)

KW - Network-on-chip

KW - Reliability

KW - Wearout

UR - http://www.scopus.com/inward/record.url?scp=84942935746&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84942935746&partnerID=8YFLogxK

U2 - 10.1145/2770873

DO - 10.1145/2770873

M3 - Article

VL - 20

JO - ACM Transactions on Design Automation of Electronic Systems

JF - ACM Transactions on Design Automation of Electronic Systems

SN - 1084-4309

IS - 4

M1 - A65

ER -