The PIVOT scheduling simulator is developed for in-depth evaluation of PIVOT scheduling algorithms in the cross-cloud, geo-distributed cloud environment. It simulates the cross-cloud infrastructure on top of VMs and networks provisioned by AWS and GCP, and runs data-intensive applications in the form of containers atop the infrastructure. The applications are workflows composed of data processing tasks with data dependencies among each other. We use the batch jobs sampled from the Alibaba 2018 cluster data as the default workload. However, any other workload in the same format can be used by the simulator.
The simulator is already dockerized and can be run as a Docker container as below:
$ docker run -ti --rm \
-v <local-job-dir>:/jobs \
-v <local-output-dir>:/output \
dchampion24/pivot-scheduling:alibaba \
--num-hosts 100 overall --num-apps 100
The <local-job-dir>
is the directory of the YAML-formatted job files. Sample job files collected from the Alibaba
cluster trace are provided here. The <output-job-dir>
is the directory for storing the experimental
data and plots generated by the simulation. Both directories require the absolute paths to comply with docker run
rules.
The simulator takes a number of parameters for tweaking the simulation as below:
$ docker run -ti dchampion24/pivot-scheduling:alibaba -h
usage: sim.py [-h] [--num-hosts N_HOSTS] [--cpus CPUS] [--mem MEM]
[--disk DISK] [--gpus GPUS] [--job-dir JOB_DIR]
[--output-dir OUTPUT_DIR]
[--task-output-scale-factor OUTPUT_SCALE_FACTOR]
{overall,num-apps} ...
Run simulation on Alibaba cluster trace
positional arguments:
{overall,num-apps} Experiment type
overall Run the overall experiment
num-apps Run the experiment with varying number of applications
optional arguments:
-h, --help show this help message and exit
--num-hosts N_HOSTS Number of hosts
--cpus CPUS Number of CPUs per host
--mem MEM RAM in MBs per host
--disk DISK Disk space in GBs per host
--gpus GPUS Number of GPU units per host
--job-dir JOB_DIR Batch job directory
--output-dir OUTPUT_DIR
Output directory for results
--task-output-scale-factor OUTPUT_SCALE_FACTOR
Scale factor of the output data size of tasks
(proportional to the memory demand)
The output data is available under <local-output-dir>/<n_app/overall>/<timestamp>
. There are two sub-directories - the
raw experimental data is stored in data/
and the plots are stored in plot/
.
$ python3 sample.py -h
usage: sample.py [-h] --num-jobs N_JOBS [--min-runtime MIN_RUNTIME]
[--max-runtime MAX_RUNTIME] --start START --interval INTERVAL
[--min-deps MIN_DEPS] [--max-parallel MAX_PARALLEL]
--output-dir OUTPUT_DIR
Script for sampling batch jobs from Alibaba cluster trace dataset
optional arguments:
-h, --help show this help message and exit
--num-jobs N_JOBS, -n N_JOBS
Number of sampled jobs
--min-runtime MIN_RUNTIME, -l MIN_RUNTIME
Minimum runtime
--max-runtime MAX_RUNTIME, -u MAX_RUNTIME
Maximum runtime
--start START, -s START
Start timestamp of the sampling
--interval INTERVAL, -i INTERVAL
Interval of the sampling
--min-deps MIN_DEPS, -d MIN_DEPS
Minimum number of tasks with dependencies in a job
--max-parallel MAX_PARALLEL, -p MAX_PARALLEL
Maximum level of parallelism of tasks
--output-dir OUTPUT_DIR, -o OUTPUT_DIR
Output directory of the sample data