Scheduling large jobs by abstraction refinement

Thomas A. Henzinger, Vasu Singh, Thomas Wies, Damien Zufferey

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The static scheduling problem often arises as a fundamental problem in real-time systems and grid computing. We consider the problem of statically scheduling a large job expressed as a task graph on a large number of computing nodes, such as a data center. This paper solves the large-scale static scheduling problem using abstraction refinement, a technique commonly used in formal verification to efficiently solve computationally hard problems. A scheduler based on abstraction refinement first attempts to solve the scheduling problem with abstract representations of the job and the computing resources. As abstract representations are generally small, the scheduling can be done reasonably fast. If the obtained schedule does not meet specified quality conditions (like data center utilization or schedule makespan) then the scheduler refines the job and data center abstractions and, again solves the scheduling problem. We develop different schedulers based on abstraction refinement. We implemented these schedulers and used them to schedule task graphs from various computing domains on simulated data centers with realistic topologies. We compared the speed of scheduling and the quality of the produced schedules with our abstraction refinement schedulers against a baseline scheduler that does not use any abstraction. We conclude that abstraction refinement techniques give a significant speed-up compared to traditional static scheduling heuristics, at a reasonable cost in the quality of the produced schedules. We further used our static schedulers in an actual system that we deployed on Amazon EC2 and compared it against the Hadoop dynamic scheduler for large MapReduce jobs. Our experiments indicate that there is great potential for static scheduling techniques.

Original languageEnglish (US)
Title of host publicationEuroSys'11 - Proceedings of the EuroSys 2011 Conference
Pages329-342
Number of pages14
DOIs
StatePublished - 2011
Event6th ACM EuroSys Conference on Computer Systems, EuroSys 2011 - Salzburg, Austria
Duration: Apr 10 2011Apr 13 2011

Other

Other6th ACM EuroSys Conference on Computer Systems, EuroSys 2011
CountryAustria
CitySalzburg
Period4/10/114/13/11

Fingerprint

Scheduling
Grid computing
Real time systems
Topology
Costs
Experiments

Keywords

  • Abstraction refinement
  • Data centers
  • Scheduling

ASJC Scopus subject areas

  • Control and Systems Engineering

Cite this

Henzinger, T. A., Singh, V., Wies, T., & Zufferey, D. (2011). Scheduling large jobs by abstraction refinement. In EuroSys'11 - Proceedings of the EuroSys 2011 Conference (pp. 329-342) https://doi.org/10.1145/1966445.1966476

Scheduling large jobs by abstraction refinement. / Henzinger, Thomas A.; Singh, Vasu; Wies, Thomas; Zufferey, Damien.

EuroSys'11 - Proceedings of the EuroSys 2011 Conference. 2011. p. 329-342.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Henzinger, TA, Singh, V, Wies, T & Zufferey, D 2011, Scheduling large jobs by abstraction refinement. in EuroSys'11 - Proceedings of the EuroSys 2011 Conference. pp. 329-342, 6th ACM EuroSys Conference on Computer Systems, EuroSys 2011, Salzburg, Austria, 4/10/11. https://doi.org/10.1145/1966445.1966476
Henzinger TA, Singh V, Wies T, Zufferey D. Scheduling large jobs by abstraction refinement. In EuroSys'11 - Proceedings of the EuroSys 2011 Conference. 2011. p. 329-342 https://doi.org/10.1145/1966445.1966476
Henzinger, Thomas A. ; Singh, Vasu ; Wies, Thomas ; Zufferey, Damien. / Scheduling large jobs by abstraction refinement. EuroSys'11 - Proceedings of the EuroSys 2011 Conference. 2011. pp. 329-342
@inproceedings{158dcb1aaa4f4ca19cdd6a05bb0b316e,
title = "Scheduling large jobs by abstraction refinement",
abstract = "The static scheduling problem often arises as a fundamental problem in real-time systems and grid computing. We consider the problem of statically scheduling a large job expressed as a task graph on a large number of computing nodes, such as a data center. This paper solves the large-scale static scheduling problem using abstraction refinement, a technique commonly used in formal verification to efficiently solve computationally hard problems. A scheduler based on abstraction refinement first attempts to solve the scheduling problem with abstract representations of the job and the computing resources. As abstract representations are generally small, the scheduling can be done reasonably fast. If the obtained schedule does not meet specified quality conditions (like data center utilization or schedule makespan) then the scheduler refines the job and data center abstractions and, again solves the scheduling problem. We develop different schedulers based on abstraction refinement. We implemented these schedulers and used them to schedule task graphs from various computing domains on simulated data centers with realistic topologies. We compared the speed of scheduling and the quality of the produced schedules with our abstraction refinement schedulers against a baseline scheduler that does not use any abstraction. We conclude that abstraction refinement techniques give a significant speed-up compared to traditional static scheduling heuristics, at a reasonable cost in the quality of the produced schedules. We further used our static schedulers in an actual system that we deployed on Amazon EC2 and compared it against the Hadoop dynamic scheduler for large MapReduce jobs. Our experiments indicate that there is great potential for static scheduling techniques.",
keywords = "Abstraction refinement, Data centers, Scheduling",
author = "Henzinger, {Thomas A.} and Vasu Singh and Thomas Wies and Damien Zufferey",
year = "2011",
doi = "10.1145/1966445.1966476",
language = "English (US)",
isbn = "9781450306348",
pages = "329--342",
booktitle = "EuroSys'11 - Proceedings of the EuroSys 2011 Conference",

}

TY - GEN

T1 - Scheduling large jobs by abstraction refinement

AU - Henzinger, Thomas A.

AU - Singh, Vasu

AU - Wies, Thomas

AU - Zufferey, Damien

PY - 2011

Y1 - 2011

N2 - The static scheduling problem often arises as a fundamental problem in real-time systems and grid computing. We consider the problem of statically scheduling a large job expressed as a task graph on a large number of computing nodes, such as a data center. This paper solves the large-scale static scheduling problem using abstraction refinement, a technique commonly used in formal verification to efficiently solve computationally hard problems. A scheduler based on abstraction refinement first attempts to solve the scheduling problem with abstract representations of the job and the computing resources. As abstract representations are generally small, the scheduling can be done reasonably fast. If the obtained schedule does not meet specified quality conditions (like data center utilization or schedule makespan) then the scheduler refines the job and data center abstractions and, again solves the scheduling problem. We develop different schedulers based on abstraction refinement. We implemented these schedulers and used them to schedule task graphs from various computing domains on simulated data centers with realistic topologies. We compared the speed of scheduling and the quality of the produced schedules with our abstraction refinement schedulers against a baseline scheduler that does not use any abstraction. We conclude that abstraction refinement techniques give a significant speed-up compared to traditional static scheduling heuristics, at a reasonable cost in the quality of the produced schedules. We further used our static schedulers in an actual system that we deployed on Amazon EC2 and compared it against the Hadoop dynamic scheduler for large MapReduce jobs. Our experiments indicate that there is great potential for static scheduling techniques.

AB - The static scheduling problem often arises as a fundamental problem in real-time systems and grid computing. We consider the problem of statically scheduling a large job expressed as a task graph on a large number of computing nodes, such as a data center. This paper solves the large-scale static scheduling problem using abstraction refinement, a technique commonly used in formal verification to efficiently solve computationally hard problems. A scheduler based on abstraction refinement first attempts to solve the scheduling problem with abstract representations of the job and the computing resources. As abstract representations are generally small, the scheduling can be done reasonably fast. If the obtained schedule does not meet specified quality conditions (like data center utilization or schedule makespan) then the scheduler refines the job and data center abstractions and, again solves the scheduling problem. We develop different schedulers based on abstraction refinement. We implemented these schedulers and used them to schedule task graphs from various computing domains on simulated data centers with realistic topologies. We compared the speed of scheduling and the quality of the produced schedules with our abstraction refinement schedulers against a baseline scheduler that does not use any abstraction. We conclude that abstraction refinement techniques give a significant speed-up compared to traditional static scheduling heuristics, at a reasonable cost in the quality of the produced schedules. We further used our static schedulers in an actual system that we deployed on Amazon EC2 and compared it against the Hadoop dynamic scheduler for large MapReduce jobs. Our experiments indicate that there is great potential for static scheduling techniques.

KW - Abstraction refinement

KW - Data centers

KW - Scheduling

UR - http://www.scopus.com/inward/record.url?scp=79955969528&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79955969528&partnerID=8YFLogxK

U2 - 10.1145/1966445.1966476

DO - 10.1145/1966445.1966476

M3 - Conference contribution

AN - SCOPUS:79955969528

SN - 9781450306348

SP - 329

EP - 342

BT - EuroSys'11 - Proceedings of the EuroSys 2011 Conference

ER -