Skip to content
This repository has been archived by the owner on Aug 25, 2020. It is now read-only.

A Reference Architecture for Datacenter Scheduling

Latest
Compare
Choose a tag to compare
@fabianishere fabianishere released this 10 Aug 19:11
· 71 commits to master since this release
b4c7f40

DOI DOI
This release contains the software artifacts of the paper A Reference Architecture for Datacenter Scheduling presented at Supercomputing 2018

For the paper, experiments have been run on the following traces:

  • Askalon (W-Eng) - askalon_workload_ee
  • Chronos (W-Ind) - chronos_exp_noscaler_ca

Each of the directories for the traces have the following structure:

  • /setup.txt
    This text file describes the trace used for the experiment in addition to the
    amount of times the experiment was repeated and the amount of warm-up
    experiments.

  • /setup.json
    This JSON file describes the topology of the datacenter used in the
    experiments. Each item represents the identifiers of the
    resource (here, CPU type) to use in the machine. The available CPU types
    are (1) Intel i7 (4 cores, 4100 MHz) and (2) Intel i5 (2 cores, 3500 MHz).

  • /trace
    This directory contains the trace used in the simulation. The trace is stored
    in the Grid Workload Format. See the Grid Workload Archive for more information.

  • /data/experiments.csv
    A CSV file containing information of all simulations that have been run
    on the OpenDC platform for this experiment.

  • /data/job_metrics.csv
    A CSV file containing metrics (NSL, JMS, etc.) for each job that ran
    during the simulations.

  • /data/stage_measurements.csv
    A CSV file containing timing measurements for the scheduling stages that
    ran during the simulations.

  • /data/task_metrics.csv
    A CSV file containing metrics for each task that ran during the simulations.

  • /data/tasks.csv
    A CSV file containing information about the tasks (submit time, runtime, etc.) that ran during the
    simulations as extracted from the traces.

    Additionally, we describe the format of each data file in the associated
    metadata file.

Hardware

The hardware used for running the experiments is a MacBook Pro with
a 2,9 GHz Intel Core i7 processor and 16 GB 2133 MHz LPDDR3 internal memory.

Reproduction

This section describes the instructions for reproducing the paper results using
a provided Docker image. Please make sure you have Docker installed
and running.

For reproduction, you will run the following experiments:

  • askalon_workload_ee
    This is the large experiment of the paper and will take approximately 4 hours
    to complete similar hardware.
  • chronos_exp_noscaler_ca
    This is the smaller experiment of the paper and will take approximately 5
    minutes to complete on similar hardware.

The Docker image atlargeresearch/sc18-experiment-runner can be used for running
the experiments. A volume can be attached to the directory
/home/gradle/simulator/data to capture the results of the experiments.

Make sure you have, in your current working directory, the following files:

  • /setup.json
    This JSON file describes the topology of the datacenter and can be found in
    this archive at askalon_workload_ee/setup.json.
  • /askalon_workload_ee.gwf
    This file contains the trace for the Askalon workload. This file can be found
    in the archive at askalon_workload_ee/trace/askalon_workload_ee.gwf.
  • /chronos_exp_noscaler_ca.gwf
    This file contains the trace for the Chronos workload. This file can be found
    in the archive at chronos_exp_noscaler_ca/trace/chronos_exp_noscaler_ca.gwf.

Then, you can start the Askalon experiments as follows:

$ docker run -it --rm -v $(pwd):/home/gradle/simulator/data atlargeresearch/sc18-experiment-runner -r 32 -w 4 -s data/setup.json data/askalon_workload_ee.gwf

The experiment runner can be configured with the following options

  • -r, --repeat
    The amount of times to repeat an experiment for each scheduler.
  • -w, --warm-up
    The amount of times to warm-up the simulator for each scheduler.
  • -p, --parallelism
    The number of experiments to run in parallel.
  • --schedulers
    The list of schedulers to test, separated by spaces. The following schedulers
    are available: SRTF-BESTFIT, SRTF-FIRSTFIT, SRTF-WORSTFIT,
    FIFO-BESTFIT, FIFO-FIRSTFIT, FIFO-WORSTFIT, RANDOM-BESTFIT,
    RANDOM-FIRSTFIT, RANDOM-WORSTFIT.

After the Askalon experiments have been finished, you can start the Chronos
experiments. Make sure you have a copy of the result files in your directory as the
result files will be overwritten.

$ docker run -it --rm -v $(pwd):/home/gradle/simulator/data atlargeresearch/sc18-experiment-runner -r 32 -w 4 -s data/setup.json data/chronos_exp_noscaler_ca.gwf