Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator

Jeff Zhang, Tianyu Gu, Kanad Basu, Siddharth Garg

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Due to their growing popularity and computational cost, deep neural networks (DNNs) are being targeted for hardware acceleration. A popular architecture for DNN acceleration, adopted by the Google Tensor Processing Unit (TPU), utilizes a systolic array based matrix multiplication unit at its core. This paper deals with the design of fault-tolerant, systolic array based DNN accelerators for high defect rate technologies. To this end, we empirically show that the classification accuracy of a baseline TPU drops significantly even at extremely low fault rates (as low as 0.006%). We then propose two novel strategies, fault-aware pruning (FAP) and fault-aware pruning+retraining (FAP+T), that enable the TPU to operate at fault rates of up to 50%, with negligible drop in classification accuracy (as low as 0.1%) and no run-time performance overhead. The FAP+T does introduce a one-time retraining penalty per TPU chip before it is deployed, but we propose optimizations that reduce this one-time penalty to under 12 minutes. The penalty is then amortized over the entire lifetime of the TPU's operation.

Original languageEnglish (US)
Title of host publicationProceedings - 2018 IEEE 36th VLSI Test Symposium, VTS 2018
PublisherIEEE Computer Society
Pages1-6
Number of pages6
Volume2018-April
ISBN (Electronic)9781538637746
DOIs
StatePublished - May 29 2018
Event36th IEEE VLSI Test Symposium, VTS 2018 - San Francisco, United States
Duration: Apr 22 2018Apr 25 2018

Other

Other36th IEEE VLSI Test Symposium, VTS 2018
CountryUnited States
CitySan Francisco
Period4/22/184/25/18

Fingerprint

Systolic arrays
Particle accelerators
Tensors
Neural networks
Processing
Hardware
Defects
Deep neural networks
Costs

ASJC Scopus subject areas

  • Computer Science Applications
  • Electrical and Electronic Engineering

Cite this

Zhang, J., Gu, T., Basu, K., & Garg, S. (2018). Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator. In Proceedings - 2018 IEEE 36th VLSI Test Symposium, VTS 2018 (Vol. 2018-April, pp. 1-6). IEEE Computer Society. https://doi.org/10.1109/VTS.2018.8368656

Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator. / Zhang, Jeff; Gu, Tianyu; Basu, Kanad; Garg, Siddharth.

Proceedings - 2018 IEEE 36th VLSI Test Symposium, VTS 2018. Vol. 2018-April IEEE Computer Society, 2018. p. 1-6.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhang, J, Gu, T, Basu, K & Garg, S 2018, Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator. in Proceedings - 2018 IEEE 36th VLSI Test Symposium, VTS 2018. vol. 2018-April, IEEE Computer Society, pp. 1-6, 36th IEEE VLSI Test Symposium, VTS 2018, San Francisco, United States, 4/22/18. https://doi.org/10.1109/VTS.2018.8368656
Zhang J, Gu T, Basu K, Garg S. Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator. In Proceedings - 2018 IEEE 36th VLSI Test Symposium, VTS 2018. Vol. 2018-April. IEEE Computer Society. 2018. p. 1-6 https://doi.org/10.1109/VTS.2018.8368656
Zhang, Jeff ; Gu, Tianyu ; Basu, Kanad ; Garg, Siddharth. / Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator. Proceedings - 2018 IEEE 36th VLSI Test Symposium, VTS 2018. Vol. 2018-April IEEE Computer Society, 2018. pp. 1-6
@inproceedings{00598c0245034bbc84c46bfda798365b,
title = "Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator",
abstract = "Due to their growing popularity and computational cost, deep neural networks (DNNs) are being targeted for hardware acceleration. A popular architecture for DNN acceleration, adopted by the Google Tensor Processing Unit (TPU), utilizes a systolic array based matrix multiplication unit at its core. This paper deals with the design of fault-tolerant, systolic array based DNN accelerators for high defect rate technologies. To this end, we empirically show that the classification accuracy of a baseline TPU drops significantly even at extremely low fault rates (as low as 0.006{\%}). We then propose two novel strategies, fault-aware pruning (FAP) and fault-aware pruning+retraining (FAP+T), that enable the TPU to operate at fault rates of up to 50{\%}, with negligible drop in classification accuracy (as low as 0.1{\%}) and no run-time performance overhead. The FAP+T does introduce a one-time retraining penalty per TPU chip before it is deployed, but we propose optimizations that reduce this one-time penalty to under 12 minutes. The penalty is then amortized over the entire lifetime of the TPU's operation.",
author = "Jeff Zhang and Tianyu Gu and Kanad Basu and Siddharth Garg",
year = "2018",
month = "5",
day = "29",
doi = "10.1109/VTS.2018.8368656",
language = "English (US)",
volume = "2018-April",
pages = "1--6",
booktitle = "Proceedings - 2018 IEEE 36th VLSI Test Symposium, VTS 2018",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - Analyzing and mitigating the impact of permanent faults on a systolic array based neural network accelerator

AU - Zhang, Jeff

AU - Gu, Tianyu

AU - Basu, Kanad

AU - Garg, Siddharth

PY - 2018/5/29

Y1 - 2018/5/29

N2 - Due to their growing popularity and computational cost, deep neural networks (DNNs) are being targeted for hardware acceleration. A popular architecture for DNN acceleration, adopted by the Google Tensor Processing Unit (TPU), utilizes a systolic array based matrix multiplication unit at its core. This paper deals with the design of fault-tolerant, systolic array based DNN accelerators for high defect rate technologies. To this end, we empirically show that the classification accuracy of a baseline TPU drops significantly even at extremely low fault rates (as low as 0.006%). We then propose two novel strategies, fault-aware pruning (FAP) and fault-aware pruning+retraining (FAP+T), that enable the TPU to operate at fault rates of up to 50%, with negligible drop in classification accuracy (as low as 0.1%) and no run-time performance overhead. The FAP+T does introduce a one-time retraining penalty per TPU chip before it is deployed, but we propose optimizations that reduce this one-time penalty to under 12 minutes. The penalty is then amortized over the entire lifetime of the TPU's operation.

AB - Due to their growing popularity and computational cost, deep neural networks (DNNs) are being targeted for hardware acceleration. A popular architecture for DNN acceleration, adopted by the Google Tensor Processing Unit (TPU), utilizes a systolic array based matrix multiplication unit at its core. This paper deals with the design of fault-tolerant, systolic array based DNN accelerators for high defect rate technologies. To this end, we empirically show that the classification accuracy of a baseline TPU drops significantly even at extremely low fault rates (as low as 0.006%). We then propose two novel strategies, fault-aware pruning (FAP) and fault-aware pruning+retraining (FAP+T), that enable the TPU to operate at fault rates of up to 50%, with negligible drop in classification accuracy (as low as 0.1%) and no run-time performance overhead. The FAP+T does introduce a one-time retraining penalty per TPU chip before it is deployed, but we propose optimizations that reduce this one-time penalty to under 12 minutes. The penalty is then amortized over the entire lifetime of the TPU's operation.

UR - http://www.scopus.com/inward/record.url?scp=85048375978&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85048375978&partnerID=8YFLogxK

U2 - 10.1109/VTS.2018.8368656

DO - 10.1109/VTS.2018.8368656

M3 - Conference contribution

AN - SCOPUS:85048375978

VL - 2018-April

SP - 1

EP - 6

BT - Proceedings - 2018 IEEE 36th VLSI Test Symposium, VTS 2018

PB - IEEE Computer Society

ER -