Optimal self-recovering microarchitecture synthesis

Ramesh Karri, Alex Orailoglu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we propose a novel ILP model for the scheduling problem in self-recovering microarchitecture synthesis. A self-recovering microarchitecture, on detecting a (transient) fault, rolls back to a previous known correct state -the checkpoint- and retires the computation. The maximum distance between adjacent checkpoints -the retry period- is determined by the transient fault rate as well as the average lifetime of a transient fault. At a checkpoint, the results of intermediate computations are compared (using voters), and if correct saved in registers. Consequently, associated with each checkpoint, there is a time overhead due to comparison and an area overhead due to the fault-tolerant nature of the voters. Firstly, we formulate time-constrained scheduling as minimizing either the number of voters or the overall hardware, subject to constraints on the number of clock cycles, the retry period, and the number of checkpoints. Moreover, we develop a model for resource-constrained scheduling wherein both the overall system performance as well as the recovery time overhead are optimized subject to hardware constraints.

Original languageEnglish (US)
Title of host publicationDigest of Papers - International Symposium on Fault-Tolerant Computing
Editors Anon
PublisherPubl by IEEE
Pages512-521
Number of pages10
ISBN (Print)0818636823
StatePublished - 1993
EventProceedings of the 23rd International Symposium on Fault-Tolerant Computing - Toulouse, Fr
Duration: Jun 22 1993Jun 24 1993

Other

OtherProceedings of the 23rd International Symposium on Fault-Tolerant Computing
CityToulouse, Fr
Period6/22/936/24/93

Fingerprint

Scheduling
Hardware
Inductive logic programming (ILP)
Clocks
Recovery

ASJC Scopus subject areas

  • Hardware and Architecture

Cite this

Karri, R., & Orailoglu, A. (1993). Optimal self-recovering microarchitecture synthesis. In Anon (Ed.), Digest of Papers - International Symposium on Fault-Tolerant Computing (pp. 512-521). Publ by IEEE.

Optimal self-recovering microarchitecture synthesis. / Karri, Ramesh; Orailoglu, Alex.

Digest of Papers - International Symposium on Fault-Tolerant Computing. ed. / Anon. Publ by IEEE, 1993. p. 512-521.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Karri, R & Orailoglu, A 1993, Optimal self-recovering microarchitecture synthesis. in Anon (ed.), Digest of Papers - International Symposium on Fault-Tolerant Computing. Publ by IEEE, pp. 512-521, Proceedings of the 23rd International Symposium on Fault-Tolerant Computing, Toulouse, Fr, 6/22/93.
Karri R, Orailoglu A. Optimal self-recovering microarchitecture synthesis. In Anon, editor, Digest of Papers - International Symposium on Fault-Tolerant Computing. Publ by IEEE. 1993. p. 512-521
Karri, Ramesh ; Orailoglu, Alex. / Optimal self-recovering microarchitecture synthesis. Digest of Papers - International Symposium on Fault-Tolerant Computing. editor / Anon. Publ by IEEE, 1993. pp. 512-521
@inproceedings{737b432f84f0446f962516d774117971,
title = "Optimal self-recovering microarchitecture synthesis",
abstract = "In this paper, we propose a novel ILP model for the scheduling problem in self-recovering microarchitecture synthesis. A self-recovering microarchitecture, on detecting a (transient) fault, rolls back to a previous known correct state -the checkpoint- and retires the computation. The maximum distance between adjacent checkpoints -the retry period- is determined by the transient fault rate as well as the average lifetime of a transient fault. At a checkpoint, the results of intermediate computations are compared (using voters), and if correct saved in registers. Consequently, associated with each checkpoint, there is a time overhead due to comparison and an area overhead due to the fault-tolerant nature of the voters. Firstly, we formulate time-constrained scheduling as minimizing either the number of voters or the overall hardware, subject to constraints on the number of clock cycles, the retry period, and the number of checkpoints. Moreover, we develop a model for resource-constrained scheduling wherein both the overall system performance as well as the recovery time overhead are optimized subject to hardware constraints.",
author = "Ramesh Karri and Alex Orailoglu",
year = "1993",
language = "English (US)",
isbn = "0818636823",
pages = "512--521",
editor = "Anon",
booktitle = "Digest of Papers - International Symposium on Fault-Tolerant Computing",
publisher = "Publ by IEEE",

}

TY - GEN

T1 - Optimal self-recovering microarchitecture synthesis

AU - Karri, Ramesh

AU - Orailoglu, Alex

PY - 1993

Y1 - 1993

N2 - In this paper, we propose a novel ILP model for the scheduling problem in self-recovering microarchitecture synthesis. A self-recovering microarchitecture, on detecting a (transient) fault, rolls back to a previous known correct state -the checkpoint- and retires the computation. The maximum distance between adjacent checkpoints -the retry period- is determined by the transient fault rate as well as the average lifetime of a transient fault. At a checkpoint, the results of intermediate computations are compared (using voters), and if correct saved in registers. Consequently, associated with each checkpoint, there is a time overhead due to comparison and an area overhead due to the fault-tolerant nature of the voters. Firstly, we formulate time-constrained scheduling as minimizing either the number of voters or the overall hardware, subject to constraints on the number of clock cycles, the retry period, and the number of checkpoints. Moreover, we develop a model for resource-constrained scheduling wherein both the overall system performance as well as the recovery time overhead are optimized subject to hardware constraints.

AB - In this paper, we propose a novel ILP model for the scheduling problem in self-recovering microarchitecture synthesis. A self-recovering microarchitecture, on detecting a (transient) fault, rolls back to a previous known correct state -the checkpoint- and retires the computation. The maximum distance between adjacent checkpoints -the retry period- is determined by the transient fault rate as well as the average lifetime of a transient fault. At a checkpoint, the results of intermediate computations are compared (using voters), and if correct saved in registers. Consequently, associated with each checkpoint, there is a time overhead due to comparison and an area overhead due to the fault-tolerant nature of the voters. Firstly, we formulate time-constrained scheduling as minimizing either the number of voters or the overall hardware, subject to constraints on the number of clock cycles, the retry period, and the number of checkpoints. Moreover, we develop a model for resource-constrained scheduling wherein both the overall system performance as well as the recovery time overhead are optimized subject to hardware constraints.

UR - http://www.scopus.com/inward/record.url?scp=0027846231&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0027846231&partnerID=8YFLogxK

M3 - Conference contribution

SN - 0818636823

SP - 512

EP - 521

BT - Digest of Papers - International Symposium on Fault-Tolerant Computing

A2 - Anon, null

PB - Publ by IEEE

ER -