Supporting very large models using automatic dataflow graph partitioning

Minjie Wang, Chien chin Huang, Jinyang Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents Tofu, a system that partitions very large DNN models across multiple GPU devices to reduce per-GPU memory footprint. Tofu is designed to partition a dataflow graph of fine-grained tensor operators used by platforms like MXNet and TensorFlow. In order to automatically partition each operator, we propose to describe the semantics of an operator in a simple language inspired by Halide. To optimally partition different operators in a dataflow graph, Tofu uses a recursive search algorithm that minimizes the total communication cost. Our experiments on an 8-GPU machine show that Tofu enables the training of very large CNN and RNN models. It also achieves 25% - 400% speedup over alternative approaches to train very large models.

Original languageEnglish (US)
Title of host publicationProceedings of the 14th EuroSys Conference 2019
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450362818
DOIs
StatePublished - Mar 25 2019
Event14th European Conference on Computer Systems, EuroSys 2019 - Dresden, Germany
Duration: Mar 25 2019Mar 28 2019

Publication series

NameProceedings of the 14th EuroSys Conference 2019

Conference

Conference14th European Conference on Computer Systems, EuroSys 2019
CountryGermany
CityDresden
Period3/25/193/28/19

Fingerprint

Tensors
Semantics
Data storage equipment
Communication
Graphics processing unit
Costs
Experiments

ASJC Scopus subject areas

  • Hardware and Architecture
  • Electrical and Electronic Engineering

Cite this

Wang, M., Huang, C. C., & Li, J. (2019). Supporting very large models using automatic dataflow graph partitioning. In Proceedings of the 14th EuroSys Conference 2019 [3303953] (Proceedings of the 14th EuroSys Conference 2019). Association for Computing Machinery, Inc. https://doi.org/10.1145/3302424.3303953

Supporting very large models using automatic dataflow graph partitioning. / Wang, Minjie; Huang, Chien chin; Li, Jinyang.

Proceedings of the 14th EuroSys Conference 2019. Association for Computing Machinery, Inc, 2019. 3303953 (Proceedings of the 14th EuroSys Conference 2019).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, M, Huang, CC & Li, J 2019, Supporting very large models using automatic dataflow graph partitioning. in Proceedings of the 14th EuroSys Conference 2019., 3303953, Proceedings of the 14th EuroSys Conference 2019, Association for Computing Machinery, Inc, 14th European Conference on Computer Systems, EuroSys 2019, Dresden, Germany, 3/25/19. https://doi.org/10.1145/3302424.3303953
Wang M, Huang CC, Li J. Supporting very large models using automatic dataflow graph partitioning. In Proceedings of the 14th EuroSys Conference 2019. Association for Computing Machinery, Inc. 2019. 3303953. (Proceedings of the 14th EuroSys Conference 2019). https://doi.org/10.1145/3302424.3303953
Wang, Minjie ; Huang, Chien chin ; Li, Jinyang. / Supporting very large models using automatic dataflow graph partitioning. Proceedings of the 14th EuroSys Conference 2019. Association for Computing Machinery, Inc, 2019. (Proceedings of the 14th EuroSys Conference 2019).
@inproceedings{fc9c50c3b7c241e58e6d9042073aed24,
title = "Supporting very large models using automatic dataflow graph partitioning",
abstract = "This paper presents Tofu, a system that partitions very large DNN models across multiple GPU devices to reduce per-GPU memory footprint. Tofu is designed to partition a dataflow graph of fine-grained tensor operators used by platforms like MXNet and TensorFlow. In order to automatically partition each operator, we propose to describe the semantics of an operator in a simple language inspired by Halide. To optimally partition different operators in a dataflow graph, Tofu uses a recursive search algorithm that minimizes the total communication cost. Our experiments on an 8-GPU machine show that Tofu enables the training of very large CNN and RNN models. It also achieves 25{\%} - 400{\%} speedup over alternative approaches to train very large models.",
author = "Minjie Wang and Huang, {Chien chin} and Jinyang Li",
year = "2019",
month = "3",
day = "25",
doi = "10.1145/3302424.3303953",
language = "English (US)",
series = "Proceedings of the 14th EuroSys Conference 2019",
publisher = "Association for Computing Machinery, Inc",
booktitle = "Proceedings of the 14th EuroSys Conference 2019",

}

TY - GEN

T1 - Supporting very large models using automatic dataflow graph partitioning

AU - Wang, Minjie

AU - Huang, Chien chin

AU - Li, Jinyang

PY - 2019/3/25

Y1 - 2019/3/25

N2 - This paper presents Tofu, a system that partitions very large DNN models across multiple GPU devices to reduce per-GPU memory footprint. Tofu is designed to partition a dataflow graph of fine-grained tensor operators used by platforms like MXNet and TensorFlow. In order to automatically partition each operator, we propose to describe the semantics of an operator in a simple language inspired by Halide. To optimally partition different operators in a dataflow graph, Tofu uses a recursive search algorithm that minimizes the total communication cost. Our experiments on an 8-GPU machine show that Tofu enables the training of very large CNN and RNN models. It also achieves 25% - 400% speedup over alternative approaches to train very large models.

AB - This paper presents Tofu, a system that partitions very large DNN models across multiple GPU devices to reduce per-GPU memory footprint. Tofu is designed to partition a dataflow graph of fine-grained tensor operators used by platforms like MXNet and TensorFlow. In order to automatically partition each operator, we propose to describe the semantics of an operator in a simple language inspired by Halide. To optimally partition different operators in a dataflow graph, Tofu uses a recursive search algorithm that minimizes the total communication cost. Our experiments on an 8-GPU machine show that Tofu enables the training of very large CNN and RNN models. It also achieves 25% - 400% speedup over alternative approaches to train very large models.

UR - http://www.scopus.com/inward/record.url?scp=85063904164&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063904164&partnerID=8YFLogxK

U2 - 10.1145/3302424.3303953

DO - 10.1145/3302424.3303953

M3 - Conference contribution

AN - SCOPUS:85063904164

T3 - Proceedings of the 14th EuroSys Conference 2019

BT - Proceedings of the 14th EuroSys Conference 2019

PB - Association for Computing Machinery, Inc

ER -