TimeEval ans Experiment File example (hash param id ) #88

B-Seif · 2023-10-19T06:17:06Z

B-Seif
Oct 19, 2023

Hi,
I would like to know how to run an algorithm in Timeval on several datasets with differents hyperparam for each dataset.

For example : I want de run Algorithm 1 overs Dataset D1 with params (k=5, lamda=0.1) and Algorithm 1 over Dataset D2 with params (k=30, lamda=0.5).

In my datasets.csv file I have D1 and D2 and I want only one execution of A1 over D1 (k=5, lamda=0.1) and A1 over D2 (k=30, lamda=0.5).

Here's what I tried, but it gives me 8 runs, when I only need two runs (mentioned above). I know it is normal to have 8 rus using FullParamGrid but is there another way to meet my needs?

algorithms = [
         Algorithm(
           name= "A1",
           main=DockerAdapter(image_name="registry.gitlab.hpi.de/akita/i/A1",tag="2", skip_pull=True),
           data_as_file=True,
            param_config=FullParameterGrid({
                "lamda" :  [0.1,0.5],
                "k"  :  [30,5],
                }),
            training_type=TrainingType.UNSUPERVISED,
            input_dimensionality=InputDimensionality.MULTIVARIATE
        ),

Answered by SebastianSchmidl

Oct 25, 2023

Can you successfully execute my MWP from my previous answer?

I suppose that your algorithm configuration does not match the experiments combination file. Please provide a full MWP. Your script misses the information about the algorithm configurations.

View full answer

SebastianSchmidl · 2023-10-20T14:28:50Z

SebastianSchmidl
Oct 20, 2023
Maintainer

Dear @B-Seif,

there are currently two ways to achieve this:

Using the experiment_combinations_file parameter of TimeEval
Changing your algorithm code and using the DatasetIdHeuristic

You already found the experiment_combinations_file parameter in issue #89. Thus, and because changing the algorithm code to handle the different hyperparameters for different datasets is non-idiomatic, I limit this answer to option 1:

You can supply TimeEval a path to an experiment combinations CSV-File with specific combinations of algorithms, datasets, and hyperparameters. TimeEval executes only experiments that are present in the TimeEval configuration and this file. Because, we designed this feature for the easy re-execution of previously failed experiments, TimeEval assumes that you know the hyperparameter IDs. In your case, they are not known, but you can compute the hyperparameter IDs using the timeeval.utils.hash_dict.hash_dict function. This is the same method TimeEval uses to compute the IDs internally. Please ensure that you include all hyperparameters that are passed using the param_config of the algorithms. Using this function, you can programmatically produce the experiments file.

I created a MWP for you:

from typing import Dict, Any
from pathlib import Path

import numpy as np
import pandas as pd

from timeeval import TimeEval, DatasetManager, Algorithm, TrainingType, InputDimensionality
from timeeval.adapters import FunctionAdapter
from timeeval.params import FullParameterGrid
from timeeval.utils.hash_dict import hash_dict


# Load dataset metadata
dm = DatasetManager(Path.cwd() / "tests" / "example_data", create_if_missing=False)

# Define algorithm
def my_algorithm(data: np.ndarray, args: Dict[str, Any]) -> np.ndarray:
    print(f"Running Algo with lambda={args.get('lambda', None)} and k={args.get('k', None)}")
    return np.full_like(data, fill_value=0., dtype=np.float_)

# Select datasets and algorithms
datasets = dm.select()

# Add algorithms to evaluate...
algorithms = [
    Algorithm(
        name="MyAlgorithm",
        main=FunctionAdapter(my_algorithm),
        data_as_file=False,
        training_type=TrainingType.UNSUPERVISED,
        input_dimensionality=InputDimensionality.UNIVARIATE,
        param_config=FullParameterGrid({"lambda": [0.1, 0.5], "k": [5, 30]})
    )
]

# Create exeuctions file
df = pd.DataFrame([
    ["MyAlgorithm", "test", "dataset-datetime", hash_dict({"lambda": 0.1, "k": 5})],
    ["MyAlgorithm", "test", "dataset-int", hash_dict({"lambda": 0.5, "k": 30})]
], columns=["algorithm", "collection", "dataset", "hyper_params_id"])
df.to_csv("experiments.csv", index=False)

timeeval = TimeEval(dm, datasets, algorithms,
                    experiment_combinations_file=Path.cwd() / "experiments.csv")

# execute evaluation
timeeval.run()
# retrieve results
print(timeeval.get_results(aggregated=False, short=False))

When you run this code, you will see that there are only two experiments (the ones specified in the experiments file):

     algorithm collection           dataset algo_training_type algo_input_dimensionality  ... error_message repetition              hyper_params                   hyper_params_id  ROC_AUC
0  MyAlgorithm       test  dataset-datetime       UNSUPERVISED                UNIVARIATE  ...           NaN          1   {"k": 5, "lambda": 0.1}  09275f1bd8c1c61f5b90302ef8663087      0.0
1  MyAlgorithm       test       dataset-int       UNSUPERVISED                UNIVARIATE  ...           NaN          1  {"k": 30, "lambda": 0.5}  8968e057bc275ef6e8119381d0b7d0a8      0.0

0 replies

B-Seif · 2023-10-20T18:29:58Z

B-Seif
Oct 20, 2023
Author

Dear @CodeLionX,
Thank you for the answer.
This is a very nice feature for TimeEval. You should add a minimal working example in the doc, I'm sure other colleagues will find this very interesting.

However, this didn't work for me. Here is the error I get. Maybe I have an old version of TimeEval (1.2.10)?

Running PREPARE phase
Preparing: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 41.58it/s]
Running EVALUATION phase
Submitting evaluation tasks: 0it [00:00, ?it/s]
Evaluating distributedly: 0it [00:00, ?it/s]
Traceback (most recent call last):
  File "_eval.py", line 170, in <module>
    main()
  File "_eval.py", line 165, in main
    timeeval.run()
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/timeeval/timeeval.py", line 595, in run
    self._resolve_future_results()
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/timeeval/timeeval.py", line 396, in _resolve_future_results
    self.results[keys] = self.results["future_result"].apply(get_future_result).tolist()
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 3600, in __setitem__
    self._setitem_array(key, value)
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 3656, in _setitem_array
    self._iset_not_inplace(key, value)
  File "/home/ubuntu/anaconda3/lib/python3.8/site-packages/pandas/core/frame.py", line 3675, in _iset_not_inplace
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key

I think the error come rom here (timeeval.py line 395 function get_future_resutl):

 def get_future_result(f: Future) -> Tuple[Any, ...]:
            try:
                r = f.result()
                return tuple(r.get(k, None) for k in result_keys) + (Status.OK, None)
            except DockerTimeoutError as e:
                self.log.exception(f"Exception {repr(e)} occurred remotely.")
                status = Status.TIMEOUT
                error_message = repr(e)
            except DockerMemoryError as e:
                self.log.exception(f"Exception {repr(e)} occurred remotely.")
                status = Status.OOM
                error_message = repr(e)
            except Exception as e:
                self.log.exception(f"Exception {repr(e)} occurred remotely.")
                status = Status.ERROR
                error_message = repr(e)

            return tuple(np.nan for _ in result_keys) + (status, error_message)
        print("Error from here empty data-frame ??!!!!!!!  : \n",self.results)
        self.results[keys] = self.results["future_result"].apply(get_future_result).tolist()
        self.results = self.results.drop(['future_result'], axis=1)

EDIT2
When I test en local mode I have no error but no execution is made :

Running PREPARE phase
Running EVALUATION phase
Evaluating: 0it [00:00, ?it/s]
Running FINALIZE phase
FINALIZE phase done.
          Stored results at /home/ubuntu/TimeEval-test/results/2023_10_20_18_55_41/results.csv.
          Overall runtime of this TimeEval run: 0.35760974884033203 seconds
        
Empty DataFrame
Columns: [repetitions]
Index: []

The exeperiments file :

algorithm,collection,dataset,hyper_params_id
kmeans,CalIt2,CalIt2-traffic,9ec0810033934adb349d2112b356bee4

The python code

dm = MultiDatasetManager([Path("/home/ubuntu/timeeval-datasets/")])
datasets = dm.select() 
timeeval = TimeEval(
        dm, 
        datasets,
        algorithms,
        repetitions=repetitions, 
        metrics=metrics,
       # remote_config=cluster_config,
        resource_constraints=rcs,
       # distributed=True,
       experiment_combinations_file= Path("/home/ubuntu/TimeEval-test/experiments.csv")
    )
    timeeval.run()
    results = timeeval.get_results(aggregated=True)
    print(results)

the datasets.csv :

collection_name,dataset_name,train_path,test_path,dataset_type,datetime_index,split_at,train_type,train_is_normal,input_type,length,dimensions,contamination,num_anomalies,min_anomaly_length,median_anomaly_length,max_anomaly_length,mean,stddev,trend,stationarity,period_size
CalIt2,CalIt2-traffic,,multivariate/CalIt2/CalIt2-traffic.test.csv,real,True,,unsupervised,False,multivariate,5040,2,0.0408730158730158,29,2,7,19,3.8128968253968254,6.422468293621787,no trend,difference_stationary,48.0
Daphnet,S09R01E0,,multivariate/Daphnet/S09R01E0.test.csv,real,True,,unsupervised,False,multivariate,9600,9,0.0336458333333333,4,44,83,112,393.0579282407408,252.6699603979836,no trend,difference_stationary,37.0
Daphnet,S09R01E4,,multivariate/Daphnet/S09R01E4.test.csv,real,True,,unsupervised,False,multivariate,9600,9,0.0279166666666666,1,268,268,268,363.1347106481482,230.52353001656687,no trend,trend_stationary,34.0

1 reply

SebastianSchmidl Oct 25, 2023
Maintainer

Thank you for your feedback 👍🏼 ➡️ #91

The ValueError: Columns must be same length as key is an unchecked configuration error. This means that no experiments were configured (either your experiment_combinations_file is empty or the file and your TimeEval-configuration have no overlap).

I will add a check to TimeEval to make the error message easier to interpret. ➡️ #92

B-Seif · 2023-10-24T12:11:47Z

B-Seif
Oct 24, 2023
Author

Hi @CodeLionX ,
Do you have any idea where this idea could come from?

3 replies

SebastianSchmidl Oct 25, 2023
Maintainer

Can you successfully execute my MWP from my previous answer?

I suppose that your algorithm configuration does not match the experiments combination file. Please provide a full MWP. Your script misses the information about the algorithm configurations.

Answer selected by B-Seif

B-Seif Nov 6, 2023
Author

Sorry for the late response. Indeed, you are right, I misconfigured the Algorithm part and everything works fine now.

However, I noticed that the PREPARE phase takes a lot of time if we have several hyper-params (let's say $8$ and each param has $10$ possible values). Is there some way to optimize this? knowing that finally I want to execute only $10$ combinations and not $10^8$.

SebastianSchmidl Nov 6, 2023
Maintainer

I suppose you are using algorithms with the DockerAdapter? Otherwise the PREPARE-phase should be quite fast, especially when just executing locally.

However, the execution time of the PREPARE-phase should not be influenced by the hyperparameter-configurations in your case - we just create the result folders for each configuration. The dominating factor should be the number of algorithm definitions. We currently execute a prepare-step for each passed algorithm - independent whether it is in the experiments combination file or not. Can you confirm that this is actually the case for you? (if it makes a difference for you if you only pass the algorithm definitions, which are also present in the experiment combinations file, I can provide a quick optimization).

B-Seif · 2023-11-06T15:27:00Z

B-Seif
Nov 6, 2023
Author

In this experiment im using only one algorithm (with the DockerAdapter) in the distributed mode. This algorithm has 8 hyper-param and each of those param has 10 values. I want to execute only 10 combinaisons.

In another execution, I used only 2 values by param and the Prepare phase was fast, so I think it's really dependent on hyper-param configuration.

Is it possible to create only the folders that are mentionned in the experiements file ?

if I take your example. Why do you have to create 4 folders (0.1 & 5, 0.1 & 30, 0.5 & 5 and 0.5 & 30 ) when we only want two executions (0.1 & 5 and 0.5 & 30)?

algorithms = [
    Algorithm(
        name="MyAlgorithm",
        main=FunctionAdapter(my_algorithm),
        data_as_file=False,
        training_type=TrainingType.UNSUPERVISED,
        input_dimensionality=InputDimensionality.UNIVARIATE,
        param_config=FullParameterGrid({"lambda": [0.1, 0.5], "k": [5, 30]})
    )
]

# Create exeuctions file
df = pd.DataFrame([
    ["MyAlgorithm", "test", "dataset-datetime", hash_dict({"lambda": 0.1, "k": 5})],
    ["MyAlgorithm", "test", "dataset-int", hash_dict({"lambda": 0.5, "k": 30})]
], columns=["algorithm", "collection", "dataset", "hyper_params_id"])
df.to_csv("experiments.csv", index=False)

timeeval = TimeEval(dm, datasets, algorithms,
                    experiment_combinations_file=Path.cwd() / "experiments.csv")

1 reply

SebastianSchmidl Nov 7, 2023
Maintainer

That is really strange behavior. TimeEval should create only the required folders for the executions from the experiment combinations file. Do you use the latest version of TimeEval (from the master-branch)?
The first execution of the PREPARE-phase when using the Docker-adapter always takes the longest, because we pull the Docker images from their repositories during this phase, if they are not present. Every subsequent pull is faster because the images are already present. Maybe this was the case for you?

For my example script, this means that there are only two folders created. And I can confirm this locally:

B-Seif · 2023-11-07T16:15:39Z

B-Seif
Nov 7, 2023
Author

Yes, I confirm that I use a single algorithm through Docker image ( present in all workers ) and that everything happens quickly if I do not give too many values possible in the FullParamGrid.

Could you give me the function that print this message 'Running Prepare Phase ' ? I want to inspect why I have this problem.

How can I ensure that I am using the latest version of TimeEval (the master branch)?

EDIT
Note that I have reproduced the same behavior with your MVP example just by adding several values the two hyper-parameters :

from typing import Dict, Any
from pathlib import Path
import numpy as np
import pandas as pd
from timeeval import TimeEval, DatasetManager, Algorithm, TrainingType, InputDimensionality
from timeeval.adapters import FunctionAdapter
from timeeval.params import FullParameterGrid
from timeeval.utils.hash_dict import hash_dict
from timeeval import  MultiDatasetManager, Algorithm, TrainingType, InputDimensionality, ResourceConstraints

# Load dataset metadata
#dm = DatasetManager(Path.cwd() / "tests" / "example_data", create_if_missing=True)
dm = MultiDatasetManager([Path("/home/ubuntu/timeeval-datasets/")])
# Define algorithm
def my_algorithm(data: np.ndarray, args: Dict[str, Any]) -> np.ndarray:
    print(f"Running Algo with lambda={args.get('lambda', None)} and k={args.get('k', None)}")
    return np.random.rand(data.shape[0])

# Select datasets and algorithms
datasets = dm.select()
print(len(datasets))

# Add algorithms to evaluate...
algorithms = [
    Algorithm(
        name="MyAlgorithm",
        main=FunctionAdapter(my_algorithm),
        data_as_file=False,
        training_type=TrainingType.UNSUPERVISED,
        input_dimensionality=InputDimensionality.MULTIVARIATE,
        param_config=FullParameterGrid({
            "lambda": [0.1, 0.5,0.2,1,2,3,4,5,6,7,8,9,10,11,236,45,256,45,26,23,78], 
            "k": [5, 1234,569,789,30,1,2,3,5,6,812,45,23,111,89,45,1789]})
    )
]

# Create exeuctions file
df = pd.DataFrame([
     ["MyAlgorithm","CalIt2","CalIt2-traffic", hash_dict({"lambda": 0.1, "k": 5})],
     ["MyAlgorithm","CalIt2","CalIt2-traffic", hash_dict({"lambda": 0.5, "k": 30})]
 ], columns=["algorithm", "collection", "dataset", "hyper_params_id"])
 df.to_csv("experiments.csv", index=False)

timeeval = TimeEval(dm, datasets, algorithms,
                    experiment_combinations_file=Path("/home/ubuntu/TimeEval-test/experiments.csv"))

# execute evaluation
timeeval.run()
# retrieve results
print(timeeval.get_results(aggregated=False, short=False))

Again, the execution gets blocked in the preparation phase.

1 reply

SebastianSchmidl Nov 11, 2023
Maintainer

That is really strange … I can't reproduce your issue:

 (timeeval) $ python test.py            
2
Running PREPARE phase
Time prepare phase: 0.48104238510131836 seconds
Running EVALUATION phase
Evaluating: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 57.43it/s]
Running FINALIZE phase
FINALIZE phase done.
          Stored results at /home/sebastian/Documents/projects/akita/timeeval/results/2023_11_11_15_57_23/results.csv.
          Overall runtime of this TimeEval run: 0.5171482563018799 seconds
        
     algorithm collection           dataset  ...              hyper_params                   hyper_params_id   ROC_AUC
0  MyAlgorithm       test  dataset-datetime  ...   {"k": 5, "lambda": 0.1}  09275f1bd8c1c61f5b90302ef8663087  0.999166
1  MyAlgorithm       test       dataset-int  ...  {"k": 30, "lambda": 0.5}  8968e057bc275ef6e8119381d0b7d0a8  0.442067

[2 rows x 18 columns]
 (timeeval) $ ls -alh results/2023_11_11_15_57_23/MyAlgorithm 
total 16K
drwxrwxr-x 4 sebastian sebastian 4,0K Nov 11 15:57 .
drwxrwxr-x 3 sebastian sebastian 4,0K Nov 11 15:57 ..
drwxrwxr-x 3 sebastian sebastian 4,0K Nov 11 15:57 09275f1bd8c1c61f5b90302ef8663087
drwxrwxr-x 3 sebastian sebastian 4,0K Nov 11 15:57 8968e057bc275ef6e8119381d0b7d0a8

(using the newest TimeEval version on main-branch : e45bc8f, the test datasets and your new script with many parameters in the parameter grid)

The PREPARE-phase is really fast (I injected a time measurement in TimeEval).

TimeEval does create exactly two result folders, as it is supposed to.

Corresponding DEBUG-log entries:

2023-11-11 15:57:24,414  DEBUG -             TimeEval: Running 2 algorithm prepare steps
2023-11-11 15:57:24,414  DEBUG -             TimeEval: Creating 2 result directories

Also see the terminal output above.

The function that is printing "Running PREPARE phase" is TimeEval.run(), but you might be more interested in the TimeEval._prepare()-method.

Please re-install TimeEval from source!
Then check if python -c 'import timeeval; print(timeeval.__version__)' prints 1.3.0rc2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TimeEval

TimeEval ans Experiment File example (hash param id ) #88

{{title}}

Replies: 5 comments 6 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

TimeEval

TimeEval ans Experiment File example (hash param id ) #88

B-Seif Oct 19, 2023

Replies: 5 comments · 6 replies

SebastianSchmidl Oct 20, 2023 Maintainer

B-Seif Oct 20, 2023 Author

SebastianSchmidl Oct 25, 2023 Maintainer

B-Seif Oct 24, 2023 Author

SebastianSchmidl Oct 25, 2023 Maintainer

B-Seif Nov 6, 2023 Author

SebastianSchmidl Nov 6, 2023 Maintainer

B-Seif Nov 6, 2023 Author

SebastianSchmidl Nov 7, 2023 Maintainer

B-Seif Nov 7, 2023 Author

SebastianSchmidl Nov 11, 2023 Maintainer

B-Seif
Oct 19, 2023

Replies: 5 comments 6 replies

SebastianSchmidl
Oct 20, 2023
Maintainer

B-Seif
Oct 20, 2023
Author

SebastianSchmidl Oct 25, 2023
Maintainer

B-Seif
Oct 24, 2023
Author

SebastianSchmidl Oct 25, 2023
Maintainer

B-Seif Nov 6, 2023
Author

SebastianSchmidl Nov 6, 2023
Maintainer

B-Seif
Nov 6, 2023
Author

SebastianSchmidl Nov 7, 2023
Maintainer

B-Seif
Nov 7, 2023
Author

SebastianSchmidl Nov 11, 2023
Maintainer