Hermes: Architecting a top-performing fault-tolerant routing algorithm for Networks-on-Chips

Costas Iordanou, Vassos Soteriou Soteriou, Konstantinos Aisopos

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Networks-on-Chips (NoCs) are experiencing escalating susceptibility to wear-out and reduced reliability, with the risk of becoming the key point of failure in an entire multicore chip. Aiming towards seamless NoC operation in the presence of faulty communication links, in this paper we propose Hermes, a highly-robust, distributed and lightweight fault-tolerant routing algorithm, whose performance degrades gracefully with increasing faulty link counts. Hermes is a deadlock-free hybrid routing algorithm, utilizing load-balancing routing on fault-free paths to sustain high-performance, while providing pre-reconfigured escape path selection in the vicinity of faults. Additionally, Hermes identifies non-communicating network partitions in scenarios where faulty links are topologically densely distributed. An extensive experimental evaluation, including utilizing traffic benchmarks gathered from full-system chip multi-processor simulations, shows that Hermes improves network throughput by up to 3× when compared against prior-art.

Original languageEnglish (US)
Title of host publication2014 32nd IEEE International Conference on Computer Design, ICCD 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages424-431
Number of pages8
ISBN (Electronic)9781479964925
DOIs
StatePublished - Jan 1 2014
Event32nd IEEE International Conference on Computer Design, ICCD 2014 - Seoul, Korea, Republic of
Duration: Oct 19 2014Oct 22 2014

Other

Other32nd IEEE International Conference on Computer Design, ICCD 2014
CountryKorea, Republic of
CitySeoul
Period10/19/1410/22/14

Fingerprint

Routing algorithms
Resource allocation
Telecommunication links
Throughput
Wear of materials
Network-on-chip

Keywords

  • chip multi-processor
  • fault-tolerance
  • Network-on-chip
  • reliability
  • routing algorithm

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Computer Science Applications

Cite this

Iordanou, C., Soteriou, V. S., & Aisopos, K. (2014). Hermes: Architecting a top-performing fault-tolerant routing algorithm for Networks-on-Chips. In 2014 32nd IEEE International Conference on Computer Design, ICCD 2014 (pp. 424-431). [6974715] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICCD.2014.6974715

Hermes : Architecting a top-performing fault-tolerant routing algorithm for Networks-on-Chips. / Iordanou, Costas; Soteriou, Vassos Soteriou; Aisopos, Konstantinos.

2014 32nd IEEE International Conference on Computer Design, ICCD 2014. Institute of Electrical and Electronics Engineers Inc., 2014. p. 424-431 6974715.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Iordanou, C, Soteriou, VS & Aisopos, K 2014, Hermes: Architecting a top-performing fault-tolerant routing algorithm for Networks-on-Chips. in 2014 32nd IEEE International Conference on Computer Design, ICCD 2014., 6974715, Institute of Electrical and Electronics Engineers Inc., pp. 424-431, 32nd IEEE International Conference on Computer Design, ICCD 2014, Seoul, Korea, Republic of, 10/19/14. https://doi.org/10.1109/ICCD.2014.6974715
Iordanou C, Soteriou VS, Aisopos K. Hermes: Architecting a top-performing fault-tolerant routing algorithm for Networks-on-Chips. In 2014 32nd IEEE International Conference on Computer Design, ICCD 2014. Institute of Electrical and Electronics Engineers Inc. 2014. p. 424-431. 6974715 https://doi.org/10.1109/ICCD.2014.6974715
Iordanou, Costas ; Soteriou, Vassos Soteriou ; Aisopos, Konstantinos. / Hermes : Architecting a top-performing fault-tolerant routing algorithm for Networks-on-Chips. 2014 32nd IEEE International Conference on Computer Design, ICCD 2014. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 424-431
@inproceedings{807037ebb09a4d28b1c6e2d753b06315,
title = "Hermes: Architecting a top-performing fault-tolerant routing algorithm for Networks-on-Chips",
abstract = "Networks-on-Chips (NoCs) are experiencing escalating susceptibility to wear-out and reduced reliability, with the risk of becoming the key point of failure in an entire multicore chip. Aiming towards seamless NoC operation in the presence of faulty communication links, in this paper we propose Hermes, a highly-robust, distributed and lightweight fault-tolerant routing algorithm, whose performance degrades gracefully with increasing faulty link counts. Hermes is a deadlock-free hybrid routing algorithm, utilizing load-balancing routing on fault-free paths to sustain high-performance, while providing pre-reconfigured escape path selection in the vicinity of faults. Additionally, Hermes identifies non-communicating network partitions in scenarios where faulty links are topologically densely distributed. An extensive experimental evaluation, including utilizing traffic benchmarks gathered from full-system chip multi-processor simulations, shows that Hermes improves network throughput by up to 3× when compared against prior-art.",
keywords = "chip multi-processor, fault-tolerance, Network-on-chip, reliability, routing algorithm",
author = "Costas Iordanou and Soteriou, {Vassos Soteriou} and Konstantinos Aisopos",
year = "2014",
month = "1",
day = "1",
doi = "10.1109/ICCD.2014.6974715",
language = "English (US)",
pages = "424--431",
booktitle = "2014 32nd IEEE International Conference on Computer Design, ICCD 2014",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Hermes

T2 - Architecting a top-performing fault-tolerant routing algorithm for Networks-on-Chips

AU - Iordanou, Costas

AU - Soteriou, Vassos Soteriou

AU - Aisopos, Konstantinos

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Networks-on-Chips (NoCs) are experiencing escalating susceptibility to wear-out and reduced reliability, with the risk of becoming the key point of failure in an entire multicore chip. Aiming towards seamless NoC operation in the presence of faulty communication links, in this paper we propose Hermes, a highly-robust, distributed and lightweight fault-tolerant routing algorithm, whose performance degrades gracefully with increasing faulty link counts. Hermes is a deadlock-free hybrid routing algorithm, utilizing load-balancing routing on fault-free paths to sustain high-performance, while providing pre-reconfigured escape path selection in the vicinity of faults. Additionally, Hermes identifies non-communicating network partitions in scenarios where faulty links are topologically densely distributed. An extensive experimental evaluation, including utilizing traffic benchmarks gathered from full-system chip multi-processor simulations, shows that Hermes improves network throughput by up to 3× when compared against prior-art.

AB - Networks-on-Chips (NoCs) are experiencing escalating susceptibility to wear-out and reduced reliability, with the risk of becoming the key point of failure in an entire multicore chip. Aiming towards seamless NoC operation in the presence of faulty communication links, in this paper we propose Hermes, a highly-robust, distributed and lightweight fault-tolerant routing algorithm, whose performance degrades gracefully with increasing faulty link counts. Hermes is a deadlock-free hybrid routing algorithm, utilizing load-balancing routing on fault-free paths to sustain high-performance, while providing pre-reconfigured escape path selection in the vicinity of faults. Additionally, Hermes identifies non-communicating network partitions in scenarios where faulty links are topologically densely distributed. An extensive experimental evaluation, including utilizing traffic benchmarks gathered from full-system chip multi-processor simulations, shows that Hermes improves network throughput by up to 3× when compared against prior-art.

KW - chip multi-processor

KW - fault-tolerance

KW - Network-on-chip

KW - reliability

KW - routing algorithm

UR - http://www.scopus.com/inward/record.url?scp=84919683804&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84919683804&partnerID=8YFLogxK

U2 - 10.1109/ICCD.2014.6974715

DO - 10.1109/ICCD.2014.6974715

M3 - Conference contribution

AN - SCOPUS:84919683804

SP - 424

EP - 431

BT - 2014 32nd IEEE International Conference on Computer Design, ICCD 2014

PB - Institute of Electrical and Electronics Engineers Inc.

ER -