TimeEval in cluster mode (resource constraints) #8

B-Seif · 2022-07-13T12:41:53Z

B-Seif
Jul 13, 2022

Hi,
Here is the composition of my cluster (hardware) and I would like to know if you could recommend me an optimal configuration (software) :

4 machines (32cores/32 Giga, 16 Cores/32Giga, 8/Cores/16Giga, 8Cores/16 Giga).
Which to take as a driver? how many tasks per host?
I want to run my algorithms the same way as in your paper, so does this code, resources constraints, match what you did?

rcs = ResourceConstraints(
        task_memory_limit = 3 * GB,
        task_cpu_limit = 1.0,
        tasks_per_host=32,
        execute_timeout=Duration("2 hours")
   # )
   )

   timeeval = TimeEval(
       dm, 
       datasets, 
       algorithms,
       repetitions=repetitions, 
       metrics=[Metric.ROC_AUC,Metric.RANGE_PR_AUC],
       remote_config=cluster_config,
       resource_constraints=rcs,
       distributed=True
   )

Otherwise some datasets are not accessible, ( IOPS, WebscopeS5), how can I get these data ? with their pre-processing if possible.
thnaks

Answered by SebastianSchmidl

Jul 13, 2022

For our evaluation paper, we used a homogenous cluster with 10 cores (20 with HT) and 32 GB of RAM per node. To be as fair as possible, we assigned 10 tasks per node and limited each task (algorithm execution) to a single core and 3 GB of RAM (10 · 3 GB = 30 GB < 32 GB):

rcs = ResourceConstraints(
    task_memory_limit = 3 * GB,
    task_cpu_limit = 1.0,
    tasks_per_host=10,
    execute_timeout=Duration("2 hours")
)

Unfortunately, TimeEval assumes a homogenous cluster, and you can't specify a different number of tasks for different nodes. This means the node with the least amount of resources is used to determine the limits: one of your 8Cores/16Giga-nodes.

To question 1

I would use you…

View full answer

SebastianSchmidl · 2022-07-13T13:30:53Z

SebastianSchmidl
Jul 13, 2022
Maintainer

For our evaluation paper, we used a homogenous cluster with 10 cores (20 with HT) and 32 GB of RAM per node. To be as fair as possible, we assigned 10 tasks per node and limited each task (algorithm execution) to a single core and 3 GB of RAM (10 · 3 GB = 30 GB < 32 GB):

rcs = ResourceConstraints(
    task_memory_limit = 3 * GB,
    task_cpu_limit = 1.0,
    tasks_per_host=10,
    execute_timeout=Duration("2 hours")
)

Unfortunately, TimeEval assumes a homogenous cluster, and you can't specify a different number of tasks for different nodes. This means the node with the least amount of resources is used to determine the limits: one of your 8Cores/16Giga-nodes.

To question 1

I would use your biggest machine (32cores/32Giga) to run the driver and the scheduler if you want to also put workers on it.

If you want to have a separate machine for the driver+scheduler that is not participating in the heavy-lifting, then you could also the smallest machine or even your own personal computer. I wouldn't advise using your personal computer because you would need to have a stable network connection to the rest of the cluster, and you would probably not be able to use it for something else.

To question 2

If you want to reproduce the paper results, you would need to give each task 3 GB of RAM and 1 CPU, which means that you can fit only 5 tasks on each host (limited by the RAM: 5 · 3 GB = 15 GB < 16 GB). Over-provisioning will impact the performance of the algorithms in a nondeterministic way.

rcs = ResourceConstraints(
    task_memory_limit = 3 * GB,
    task_cpu_limit = 1.0,
    tasks_per_host=5,
    execute_timeout=Duration("2 hours")
)

The following configuration would, however, give you optimal resource usage (ignoring non-homogeneity):

rcs = ResourceConstraints(
    task_memory_limit = 2 * GB,
    task_cpu_limit = 1.0,
    tasks_per_host=8,
)

To question 3

We are legally not allowed to share these datasets (which in our opinion also applies to the preprocessed versions). You can get these datasets from their original sources and must preprocess them yourself. You can find the preprocessing scripts (notebooks) for all datasets in the notebooks/data-prep-folder, for example for the WebscopeS5-collection: https://github.com/HPI-Information-Systems/TimeEval/blob/main/notebooks/data-prep/YahooWebscopeS5.ipynb.

The preprocessing notebooks still use an old version of TimeEval. You might need to adapt them to the new API. However, the preprocessing steps are the same!

3 replies

B-Seif Jul 13, 2022
Author

Thanks very much ! without your help I would not have been able to launch the cluster.
I may be going to merge the two 8-core machines to have a single 16-core & 32 Giga machine. Thus, I will be able to launch 16 tasks per machine and have less network communication.

Otherwise I still had a question about the docker registry, what should I do? do I have to create a private registry and set the skip_pull flag to False in my evaluation script?

Another question, what is the purpose of the repetitions parameter given that we are using the same seed . Maybe for supervised algorithms ? for each repetition a different dataset train? I don't see how you handle this.

SebastianSchmidl Jul 13, 2022
Maintainer

Registry

There are multiple solutions to that. However, they are your personal taste. That's what I can think of:

Build the Docker images locally on each machine (e.g., using a terminal multiplexer)
Build the Docker images on one machine and distribute them. Rough outline: docker build, docker image save, rsync to all machines, docker image import
Push / publish image to a registry available to you (if it's public, you would be responsible for maintaining it)
Host your own registry

Repetitions

repetitions are for measuring runtime (usually runtime measurements are aggregated over multiple runs = repetitions), see https://timeeval.readthedocs.io/en/latest/user/index.html#repetitive-runs-and-measuring-runtime (still WIP). Using repetitions=1 is fine.

B-Seif Jul 13, 2022
Author

Ok I will try these approaches.

B-Seif · 2022-07-13T18:41:04Z

B-Seif
Jul 13, 2022
Author

Can I have the metrics, (ROC_AUC, PR, ..) directly if I call my algorithm like this ? :

python algo_custom.py 
'{ 
"executionType": "execute", 
"dataInput": "/home/ubuntu/timeeval-datasets/univariate/KDD-TSAD/011_UCR_Anomaly_DISTORTEDECG1.test.csv", 
"dataOutput": "scores_algo_custom.csv",
 "customParameters": { "window_size": 10 }
 }'

1 reply

SebastianSchmidl Jul 14, 2022
Maintainer

No, this just executes your algorithm. The metric scores are calculated by TimeEval afterwards.

But you could just create a small script and call it:

>>> python metric.py /home/ubuntu/timeeval-datasets/univariate/KDD-TSAD/011_UCR_Anomaly_DISTORTEDECG1.test.csv scores_algo_custom.csv
0.763459237342

metric.py for TimeEval ≥ 1.2.5:

#!/usr/bin/env bash
import sys
import pandas as pd
from timeeval.utils.metrics import DefaultMetrics

if __name__ == "__main__":
    df_data = pd.read_csv(sys.argv[1])
    df_scores = pd.read_csv(sys.argv[2], header=None)
    metric = DefaultMetrics.ROC_AUC(df_data["is_anomaly"], df_scores.iloc[:, 0])
    print(metric)

metric.py for TimeEval < 1.2.5:

#!/usr/bin/env bash
import sys
import pandas as pd
from timeeval.utils.metrics import Metric

if __name__ == "__main__":
    df_data = pd.read_csv(sys.argv[1])
    df_scores = pd.read_csv(sys.argv[2], header=None)
    metric = Metric.ROC_AUC(df_data["is_anomaly"], df_scores.iloc[:, 0])
    print(metric)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TimeEval

TimeEval in cluster mode (resource constraints) #8

{{title}}

Replies: 2 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

TimeEval

TimeEval in cluster mode (resource constraints) #8

B-Seif Jul 13, 2022

To question 1

Replies: 2 comments · 4 replies

SebastianSchmidl Jul 13, 2022 Maintainer

To question 1

To question 2

To question 3

B-Seif Jul 13, 2022 Author

SebastianSchmidl Jul 13, 2022 Maintainer

Registry

Repetitions

B-Seif Jul 13, 2022 Author

B-Seif Jul 13, 2022 Author

SebastianSchmidl Jul 14, 2022 Maintainer

B-Seif
Jul 13, 2022

Replies: 2 comments 4 replies

SebastianSchmidl
Jul 13, 2022
Maintainer

B-Seif Jul 13, 2022
Author

SebastianSchmidl Jul 13, 2022
Maintainer

B-Seif Jul 13, 2022
Author

B-Seif
Jul 13, 2022
Author

SebastianSchmidl Jul 14, 2022
Maintainer