TimeEval in cluster mode (resource constraints) #8
-
Hi,
rcs = ResourceConstraints(
task_memory_limit = 3 * GB,
task_cpu_limit = 1.0,
tasks_per_host=32,
execute_timeout=Duration("2 hours")
# )
)
timeeval = TimeEval(
dm,
datasets,
algorithms,
repetitions=repetitions,
metrics=[Metric.ROC_AUC,Metric.RANGE_PR_AUC],
remote_config=cluster_config,
resource_constraints=rcs,
distributed=True
)
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
For our evaluation paper, we used a homogenous cluster with 10 cores (20 with HT) and 32 GB of RAM per node. To be as fair as possible, we assigned 10 tasks per node and limited each task (algorithm execution) to a single core and 3 GB of RAM (10 · 3 GB = 30 GB < 32 GB): rcs = ResourceConstraints(
task_memory_limit = 3 * GB,
task_cpu_limit = 1.0,
tasks_per_host=10,
execute_timeout=Duration("2 hours")
) Unfortunately, TimeEval assumes a homogenous cluster, and you can't specify a different number of tasks for different nodes. This means the node with the least amount of resources is used to determine the limits: one of your 8Cores/16Giga-nodes. To question 1I would use your biggest machine (32cores/32Giga) to run the driver and the scheduler if you want to also put workers on it. If you want to have a separate machine for the driver+scheduler that is not participating in the heavy-lifting, then you could also the smallest machine or even your own personal computer. I wouldn't advise using your personal computer because you would need to have a stable network connection to the rest of the cluster, and you would probably not be able to use it for something else. To question 2If you want to reproduce the paper results, you would need to give each task 3 GB of RAM and 1 CPU, which means that you can fit only 5 tasks on each host (limited by the RAM: 5 · 3 GB = 15 GB < 16 GB). Over-provisioning will impact the performance of the algorithms in a nondeterministic way. rcs = ResourceConstraints(
task_memory_limit = 3 * GB,
task_cpu_limit = 1.0,
tasks_per_host=5,
execute_timeout=Duration("2 hours")
) The following configuration would, however, give you optimal resource usage (ignoring non-homogeneity): rcs = ResourceConstraints(
task_memory_limit = 2 * GB,
task_cpu_limit = 1.0,
tasks_per_host=8,
) To question 3We are legally not allowed to share these datasets (which in our opinion also applies to the preprocessed versions). You can get these datasets from their original sources and must preprocess them yourself. You can find the preprocessing scripts (notebooks) for all datasets in the
|
Beta Was this translation helpful? Give feedback.
-
Can I have the metrics, (ROC_AUC, PR, ..) directly if I call my algorithm like this ? : python algo_custom.py
'{
"executionType": "execute",
"dataInput": "/home/ubuntu/timeeval-datasets/univariate/KDD-TSAD/011_UCR_Anomaly_DISTORTEDECG1.test.csv",
"dataOutput": "scores_algo_custom.csv",
"customParameters": { "window_size": 10 }
}' |
Beta Was this translation helpful? Give feedback.
For our evaluation paper, we used a homogenous cluster with 10 cores (20 with HT) and 32 GB of RAM per node. To be as fair as possible, we assigned 10 tasks per node and limited each task (algorithm execution) to a single core and 3 GB of RAM (10 · 3 GB = 30 GB < 32 GB):
Unfortunately, TimeEval assumes a homogenous cluster, and you can't specify a different number of tasks for different nodes. This means the node with the least amount of resources is used to determine the limits: one of your 8Cores/16Giga-nodes.
To question 1
I would use you…