Use it or lose it: Proactive, deterministic longevity in future chip multiprocessors

Hyungjun Kim, Siva Bhanu Krishna Boga, Arseniy Vitkovskiy, Stavros Hadjitheophanous, Paul V. Gratz, Vassos Soteriou, Maria K. Michael

Research output: Contribution to journalArticle

Abstract

Moore's Law scaling continues to yield higher transistor density with each succeeding process generation, leading to today'smany-core chip multiprocessors (CMPs) with tens or even hundreds of interconnected cores or tiles. Unfortunately, deep submicron CMOS process technology is marred by increasing susceptibility to wear. Prolonged operational stress gives rise to accelerated wearout and failure due to several physical failure mechanisms, including hot-carrier injection (HCI) and negative-bias temperature instability (NBTI). Each failure mechanism correlates with different usage-based stresses, all of which can eventually generate permanent faults. While the wearout of an individual core in many-core CMPs may not necessarily be catastrophic, a single fault in the interprocessor network-on-chip (NoC) fabric could render the entire chip useless, as it could lead to protocol-level deadlocks, or even partition away vital components such as the memory controller or other critical I/O. In this article, we study HCI- and NBTI-induced wear due to actual stresses caused by real workloads, applied onto the interconnect microarchitecture and develop a critical path model for NBTI-induced wearout. A key finding of this modeling is that, counter to prevailing wisdom, wearout in the CMP's on-chip interconnect is correlated with lack of load observed in the NoC routers rather than high load. We then develop a novel wearout-decelerating scheme in which routers under low load have their wear-sensitive components exercised without significantly impacting cycle time, pipeline depth, area, or power consumption of the overall router. A novel deterministic approach is proposed for the generation of appropriate exercise-mode data, ensuring design parameter targets are met. We subsequently show that the proposed design yields an ∼2,300× decrease in the rate of wear.

Original languageEnglish (US)
Article numberA65
JournalACM Transactions on Design Automation of Electronic Systems
Volume20
Issue number4
DOIs
StatePublished - Sep 1 2015

Keywords

  • Hot-carrier injection (HCI)
  • Lifetime
  • Negative-bias temperature instability (NBTI)
  • Network-on-chip
  • Reliability
  • Wearout

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Graphics and Computer-Aided Design
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Use it or lose it: Proactive, deterministic longevity in future chip multiprocessors'. Together they form a unique fingerprint.

  • Cite this