Compact: On-chip compression of activations for low power systolic array based CNN acceleration

Jeff Zhang, Parul Raj, Shuayb Zarar, Amol Ambardekar, Siddharth Garg

Research output: Contribution to journalArticle

Abstract

This paper addresses the design of systolic array (SA) based convolutional neural network (CNN) accelerators for mobile and embedded domains. On- and off-chip memory accesses to the large activation inputs (sometimes called feature maps) of CNN layers contribute significantly to total energy consumption for such accelerators; while prior has proposed off-chip compression, activations are still stored on-chip in uncompressed form, requiring either large on-chip activation buffers or slow and energy-hungry off-chip accesses. In this paper, we propose CompAct, a new architecture that enables on-chip compression of activations for SA based CNN accelerators. CompAct is built around several key ideas. First, CompAct identifies an SA schedule that has nearly regular access patterns, enabling the use of a modified run-length coding scheme (RLC). Second, CompAct improves compression ratio of the RLC scheme using Sparse-RLC in later CNN layers and Lossy-RLC in earlier layers. Finally, CompAct proposes look-ahead snoozing that operates synergistically with RLC to reduce the leakage energy of activation buffers. Based on detailed synthesis results, we show that CompAct enables up to 62% reduction in activation buffer energy, and 34% reduction in total chip energy.

Original languageEnglish (US)
Article numbera47
JournalACM Transactions on Embedded Computing Systems
Volume18
Issue number5s
DOIs
StatePublished - Oct 2019

Fingerprint

Systolic arrays
Chemical activation
Neural networks
Particle accelerators
Network layers
Energy utilization
Data storage equipment

Keywords

  • Deep neural networks
  • Low-power design
  • Systolic arrays

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture

Cite this

Compact : On-chip compression of activations for low power systolic array based CNN acceleration. / Zhang, Jeff; Raj, Parul; Zarar, Shuayb; Ambardekar, Amol; Garg, Siddharth.

In: ACM Transactions on Embedded Computing Systems, Vol. 18, No. 5s, a47, 10.2019.

Research output: Contribution to journalArticle

@article{ea1385ad045b47b1a0d1ea8dea53eb24,
title = "Compact: On-chip compression of activations for low power systolic array based CNN acceleration",
abstract = "This paper addresses the design of systolic array (SA) based convolutional neural network (CNN) accelerators for mobile and embedded domains. On- and off-chip memory accesses to the large activation inputs (sometimes called feature maps) of CNN layers contribute significantly to total energy consumption for such accelerators; while prior has proposed off-chip compression, activations are still stored on-chip in uncompressed form, requiring either large on-chip activation buffers or slow and energy-hungry off-chip accesses. In this paper, we propose CompAct, a new architecture that enables on-chip compression of activations for SA based CNN accelerators. CompAct is built around several key ideas. First, CompAct identifies an SA schedule that has nearly regular access patterns, enabling the use of a modified run-length coding scheme (RLC). Second, CompAct improves compression ratio of the RLC scheme using Sparse-RLC in later CNN layers and Lossy-RLC in earlier layers. Finally, CompAct proposes look-ahead snoozing that operates synergistically with RLC to reduce the leakage energy of activation buffers. Based on detailed synthesis results, we show that CompAct enables up to 62{\%} reduction in activation buffer energy, and 34{\%} reduction in total chip energy.",
keywords = "Deep neural networks, Low-power design, Systolic arrays",
author = "Jeff Zhang and Parul Raj and Shuayb Zarar and Amol Ambardekar and Siddharth Garg",
year = "2019",
month = "10",
doi = "10.1145/3358178",
language = "English (US)",
volume = "18",
journal = "Transactions on Embedded Computing Systems",
issn = "1539-9087",
publisher = "Association for Computing Machinery (ACM)",
number = "5s",

}

TY - JOUR

T1 - Compact

T2 - On-chip compression of activations for low power systolic array based CNN acceleration

AU - Zhang, Jeff

AU - Raj, Parul

AU - Zarar, Shuayb

AU - Ambardekar, Amol

AU - Garg, Siddharth

PY - 2019/10

Y1 - 2019/10

N2 - This paper addresses the design of systolic array (SA) based convolutional neural network (CNN) accelerators for mobile and embedded domains. On- and off-chip memory accesses to the large activation inputs (sometimes called feature maps) of CNN layers contribute significantly to total energy consumption for such accelerators; while prior has proposed off-chip compression, activations are still stored on-chip in uncompressed form, requiring either large on-chip activation buffers or slow and energy-hungry off-chip accesses. In this paper, we propose CompAct, a new architecture that enables on-chip compression of activations for SA based CNN accelerators. CompAct is built around several key ideas. First, CompAct identifies an SA schedule that has nearly regular access patterns, enabling the use of a modified run-length coding scheme (RLC). Second, CompAct improves compression ratio of the RLC scheme using Sparse-RLC in later CNN layers and Lossy-RLC in earlier layers. Finally, CompAct proposes look-ahead snoozing that operates synergistically with RLC to reduce the leakage energy of activation buffers. Based on detailed synthesis results, we show that CompAct enables up to 62% reduction in activation buffer energy, and 34% reduction in total chip energy.

AB - This paper addresses the design of systolic array (SA) based convolutional neural network (CNN) accelerators for mobile and embedded domains. On- and off-chip memory accesses to the large activation inputs (sometimes called feature maps) of CNN layers contribute significantly to total energy consumption for such accelerators; while prior has proposed off-chip compression, activations are still stored on-chip in uncompressed form, requiring either large on-chip activation buffers or slow and energy-hungry off-chip accesses. In this paper, we propose CompAct, a new architecture that enables on-chip compression of activations for SA based CNN accelerators. CompAct is built around several key ideas. First, CompAct identifies an SA schedule that has nearly regular access patterns, enabling the use of a modified run-length coding scheme (RLC). Second, CompAct improves compression ratio of the RLC scheme using Sparse-RLC in later CNN layers and Lossy-RLC in earlier layers. Finally, CompAct proposes look-ahead snoozing that operates synergistically with RLC to reduce the leakage energy of activation buffers. Based on detailed synthesis results, we show that CompAct enables up to 62% reduction in activation buffer energy, and 34% reduction in total chip energy.

KW - Deep neural networks

KW - Low-power design

KW - Systolic arrays

UR - http://www.scopus.com/inward/record.url?scp=85073170953&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85073170953&partnerID=8YFLogxK

U2 - 10.1145/3358178

DO - 10.1145/3358178

M3 - Article

AN - SCOPUS:85073170953

VL - 18

JO - Transactions on Embedded Computing Systems

JF - Transactions on Embedded Computing Systems

SN - 1539-9087

IS - 5s

M1 - a47

ER -