Workload-cognizant concurrent error detection in the scheduler of a modern microprocessor

Naghmeh Karimi, Mihalis Maniatakos, Abhijit Jas, Chandrasekharan Tirumurti, Yiorgos Makris

Research output: Contribution to journalArticle

Abstract

We present a Concurrent Error Detection (CED) scheme for the Scheduler of a modern microprocessor. The proposed CED scheme is based on monitoring a set of invariances imposed through added hardware, violation of which signifies the occurrence of an error. The novelty of our solution stems from the workload-cognizant way in which these invariances are selected so that they leverage the application-level error masking inherent in program execution. Specifically, in order to ensure cost-effectiveness of the hardware employed to construct these invariances, we make use of information regarding the type and frequency of errors affecting the typical workload of the microprocessor. Thereby, we identify the most susceptible aspects of instruction execution and we accordingly distribute CED resources to protect them. Our approach is demonstrated on the Scheduler of an Alpha-like superscalar microprocessor with dynamic scheduling, hybrid branch prediction and out-of-order execution capabilities. Using an extensive fault-simulation infrastructure that we developed around this microprocessor, we profile the impact of Scheduler faults across a variety of different SPEC2000 benchmarks. Based on the results, we construct a CED scheme which monitors the time and location of instruction execution, the executed operation, the utilized resources, as well as the executed and retired sequence of instructions. At a hardware cost of only 32 percent of the Scheduler, the corresponding CED scheme detects over 85 percent of its faults that affect the architectural state of the microprocessor. Furthermore, over 99.5 percent of these faults are detected before they corrupt the architectural state, while the average detection latency for the remaining faults is in the order of a few clock cycles, implying that efficient recovery methods can be developed.

Original languageEnglish (US)
Article number5669287
Pages (from-to)1274-1287
Number of pages14
JournalIEEE Transactions on Computers
Volume60
Issue number9
DOIs
StatePublished - Aug 8 2011

Fingerprint

Error Detection
Error detection
Microprocessor
Scheduler
Workload
Microprocessor chips
Concurrent
Fault
Invariance
Percent
Hardware
Branch Prediction
Fault Simulation
Superscalar
Dynamic Scheduling
Resources
Cost-effectiveness
Masking
Cost effectiveness
Leverage

Keywords

  • Concurrent error detection
  • invariance
  • microprocessor
  • scheduler

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Software
  • Hardware and Architecture
  • Computational Theory and Mathematics

Cite this

Workload-cognizant concurrent error detection in the scheduler of a modern microprocessor. / Karimi, Naghmeh; Maniatakos, Mihalis; Jas, Abhijit; Tirumurti, Chandrasekharan; Makris, Yiorgos.

In: IEEE Transactions on Computers, Vol. 60, No. 9, 5669287, 08.08.2011, p. 1274-1287.

Research output: Contribution to journalArticle

Karimi, Naghmeh ; Maniatakos, Mihalis ; Jas, Abhijit ; Tirumurti, Chandrasekharan ; Makris, Yiorgos. / Workload-cognizant concurrent error detection in the scheduler of a modern microprocessor. In: IEEE Transactions on Computers. 2011 ; Vol. 60, No. 9. pp. 1274-1287.
@article{22563fbf596f47bcb01794d79c1a7213,
title = "Workload-cognizant concurrent error detection in the scheduler of a modern microprocessor",
abstract = "We present a Concurrent Error Detection (CED) scheme for the Scheduler of a modern microprocessor. The proposed CED scheme is based on monitoring a set of invariances imposed through added hardware, violation of which signifies the occurrence of an error. The novelty of our solution stems from the workload-cognizant way in which these invariances are selected so that they leverage the application-level error masking inherent in program execution. Specifically, in order to ensure cost-effectiveness of the hardware employed to construct these invariances, we make use of information regarding the type and frequency of errors affecting the typical workload of the microprocessor. Thereby, we identify the most susceptible aspects of instruction execution and we accordingly distribute CED resources to protect them. Our approach is demonstrated on the Scheduler of an Alpha-like superscalar microprocessor with dynamic scheduling, hybrid branch prediction and out-of-order execution capabilities. Using an extensive fault-simulation infrastructure that we developed around this microprocessor, we profile the impact of Scheduler faults across a variety of different SPEC2000 benchmarks. Based on the results, we construct a CED scheme which monitors the time and location of instruction execution, the executed operation, the utilized resources, as well as the executed and retired sequence of instructions. At a hardware cost of only 32 percent of the Scheduler, the corresponding CED scheme detects over 85 percent of its faults that affect the architectural state of the microprocessor. Furthermore, over 99.5 percent of these faults are detected before they corrupt the architectural state, while the average detection latency for the remaining faults is in the order of a few clock cycles, implying that efficient recovery methods can be developed.",
keywords = "Concurrent error detection, invariance, microprocessor, scheduler",
author = "Naghmeh Karimi and Mihalis Maniatakos and Abhijit Jas and Chandrasekharan Tirumurti and Yiorgos Makris",
year = "2011",
month = "8",
day = "8",
doi = "10.1109/TC.2010.265",
language = "English (US)",
volume = "60",
pages = "1274--1287",
journal = "IEEE Transactions on Computers",
issn = "0018-9340",
publisher = "IEEE Computer Society",
number = "9",

}

TY - JOUR

T1 - Workload-cognizant concurrent error detection in the scheduler of a modern microprocessor

AU - Karimi, Naghmeh

AU - Maniatakos, Mihalis

AU - Jas, Abhijit

AU - Tirumurti, Chandrasekharan

AU - Makris, Yiorgos

PY - 2011/8/8

Y1 - 2011/8/8

N2 - We present a Concurrent Error Detection (CED) scheme for the Scheduler of a modern microprocessor. The proposed CED scheme is based on monitoring a set of invariances imposed through added hardware, violation of which signifies the occurrence of an error. The novelty of our solution stems from the workload-cognizant way in which these invariances are selected so that they leverage the application-level error masking inherent in program execution. Specifically, in order to ensure cost-effectiveness of the hardware employed to construct these invariances, we make use of information regarding the type and frequency of errors affecting the typical workload of the microprocessor. Thereby, we identify the most susceptible aspects of instruction execution and we accordingly distribute CED resources to protect them. Our approach is demonstrated on the Scheduler of an Alpha-like superscalar microprocessor with dynamic scheduling, hybrid branch prediction and out-of-order execution capabilities. Using an extensive fault-simulation infrastructure that we developed around this microprocessor, we profile the impact of Scheduler faults across a variety of different SPEC2000 benchmarks. Based on the results, we construct a CED scheme which monitors the time and location of instruction execution, the executed operation, the utilized resources, as well as the executed and retired sequence of instructions. At a hardware cost of only 32 percent of the Scheduler, the corresponding CED scheme detects over 85 percent of its faults that affect the architectural state of the microprocessor. Furthermore, over 99.5 percent of these faults are detected before they corrupt the architectural state, while the average detection latency for the remaining faults is in the order of a few clock cycles, implying that efficient recovery methods can be developed.

AB - We present a Concurrent Error Detection (CED) scheme for the Scheduler of a modern microprocessor. The proposed CED scheme is based on monitoring a set of invariances imposed through added hardware, violation of which signifies the occurrence of an error. The novelty of our solution stems from the workload-cognizant way in which these invariances are selected so that they leverage the application-level error masking inherent in program execution. Specifically, in order to ensure cost-effectiveness of the hardware employed to construct these invariances, we make use of information regarding the type and frequency of errors affecting the typical workload of the microprocessor. Thereby, we identify the most susceptible aspects of instruction execution and we accordingly distribute CED resources to protect them. Our approach is demonstrated on the Scheduler of an Alpha-like superscalar microprocessor with dynamic scheduling, hybrid branch prediction and out-of-order execution capabilities. Using an extensive fault-simulation infrastructure that we developed around this microprocessor, we profile the impact of Scheduler faults across a variety of different SPEC2000 benchmarks. Based on the results, we construct a CED scheme which monitors the time and location of instruction execution, the executed operation, the utilized resources, as well as the executed and retired sequence of instructions. At a hardware cost of only 32 percent of the Scheduler, the corresponding CED scheme detects over 85 percent of its faults that affect the architectural state of the microprocessor. Furthermore, over 99.5 percent of these faults are detected before they corrupt the architectural state, while the average detection latency for the remaining faults is in the order of a few clock cycles, implying that efficient recovery methods can be developed.

KW - Concurrent error detection

KW - invariance

KW - microprocessor

KW - scheduler

UR - http://www.scopus.com/inward/record.url?scp=79961062447&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79961062447&partnerID=8YFLogxK

U2 - 10.1109/TC.2010.265

DO - 10.1109/TC.2010.265

M3 - Article

VL - 60

SP - 1274

EP - 1287

JO - IEEE Transactions on Computers

JF - IEEE Transactions on Computers

SN - 0018-9340

IS - 9

M1 - 5669287

ER -