TimeEval ans Experiment File example (hash param id ) #88
-
Hi, For example : I want de run Algorithm 1 overs Dataset D1 with params (k=5, lamda=0.1) and Algorithm 1 over Dataset D2 with params (k=30, lamda=0.5). In my datasets.csv file I have D1 and D2 and I want only one execution of A1 over D1 (k=5, lamda=0.1) and A1 over D2 (k=30, lamda=0.5). Here's what I tried, but it gives me 8 runs, when I only need two runs (mentioned above). I know it is normal to have 8 rus using FullParamGrid but is there another way to meet my needs? algorithms = [
Algorithm(
name= "A1",
main=DockerAdapter(image_name="registry.gitlab.hpi.de/akita/i/A1",tag="2", skip_pull=True),
data_as_file=True,
param_config=FullParameterGrid({
"lamda" : [0.1,0.5],
"k" : [30,5],
}),
training_type=TrainingType.UNSUPERVISED,
input_dimensionality=InputDimensionality.MULTIVARIATE
), |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 6 replies
-
Dear @B-Seif, there are currently two ways to achieve this:
You already found the You can supply TimeEval a path to an experiment combinations CSV-File with specific combinations of algorithms, datasets, and hyperparameters. TimeEval executes only experiments that are present in the TimeEval configuration and this file. Because, we designed this feature for the easy re-execution of previously failed experiments, TimeEval assumes that you know the hyperparameter IDs. In your case, they are not known, but you can compute the hyperparameter IDs using the I created a MWP for you: from typing import Dict, Any
from pathlib import Path
import numpy as np
import pandas as pd
from timeeval import TimeEval, DatasetManager, Algorithm, TrainingType, InputDimensionality
from timeeval.adapters import FunctionAdapter
from timeeval.params import FullParameterGrid
from timeeval.utils.hash_dict import hash_dict
# Load dataset metadata
dm = DatasetManager(Path.cwd() / "tests" / "example_data", create_if_missing=False)
# Define algorithm
def my_algorithm(data: np.ndarray, args: Dict[str, Any]) -> np.ndarray:
print(f"Running Algo with lambda={args.get('lambda', None)} and k={args.get('k', None)}")
return np.full_like(data, fill_value=0., dtype=np.float_)
# Select datasets and algorithms
datasets = dm.select()
# Add algorithms to evaluate...
algorithms = [
Algorithm(
name="MyAlgorithm",
main=FunctionAdapter(my_algorithm),
data_as_file=False,
training_type=TrainingType.UNSUPERVISED,
input_dimensionality=InputDimensionality.UNIVARIATE,
param_config=FullParameterGrid({"lambda": [0.1, 0.5], "k": [5, 30]})
)
]
# Create exeuctions file
df = pd.DataFrame([
["MyAlgorithm", "test", "dataset-datetime", hash_dict({"lambda": 0.1, "k": 5})],
["MyAlgorithm", "test", "dataset-int", hash_dict({"lambda": 0.5, "k": 30})]
], columns=["algorithm", "collection", "dataset", "hyper_params_id"])
df.to_csv("experiments.csv", index=False)
timeeval = TimeEval(dm, datasets, algorithms,
experiment_combinations_file=Path.cwd() / "experiments.csv")
# execute evaluation
timeeval.run()
# retrieve results
print(timeeval.get_results(aggregated=False, short=False)) When you run this code, you will see that there are only two experiments (the ones specified in the experiments file):
|
Beta Was this translation helpful? Give feedback.
-
Dear @CodeLionX, However, this didn't work for me. Here is the error I get. Maybe I have an old version of TimeEval (1.2.10)?
I think the error come rom here (timeeval.py line 395 function get_future_resutl): def get_future_result(f: Future) -> Tuple[Any, ...]:
try:
r = f.result()
return tuple(r.get(k, None) for k in result_keys) + (Status.OK, None)
except DockerTimeoutError as e:
self.log.exception(f"Exception {repr(e)} occurred remotely.")
status = Status.TIMEOUT
error_message = repr(e)
except DockerMemoryError as e:
self.log.exception(f"Exception {repr(e)} occurred remotely.")
status = Status.OOM
error_message = repr(e)
except Exception as e:
self.log.exception(f"Exception {repr(e)} occurred remotely.")
status = Status.ERROR
error_message = repr(e)
return tuple(np.nan for _ in result_keys) + (status, error_message)
print("Error from here empty data-frame ??!!!!!!! : \n",self.results)
self.results[keys] = self.results["future_result"].apply(get_future_result).tolist()
self.results = self.results.drop(['future_result'], axis=1) EDIT2
The exeperiments file :
The python code dm = MultiDatasetManager([Path("/home/ubuntu/timeeval-datasets/")])
datasets = dm.select()
timeeval = TimeEval(
dm,
datasets,
algorithms,
repetitions=repetitions,
metrics=metrics,
# remote_config=cluster_config,
resource_constraints=rcs,
# distributed=True,
experiment_combinations_file= Path("/home/ubuntu/TimeEval-test/experiments.csv")
)
timeeval.run()
results = timeeval.get_results(aggregated=True)
print(results) the datasets.csv :
|
Beta Was this translation helpful? Give feedback.
-
Hi @CodeLionX , |
Beta Was this translation helpful? Give feedback.
-
In this experiment im using only one algorithm (with the DockerAdapter) in the distributed mode. This algorithm has 8 hyper-param and each of those param has 10 values. I want to execute only 10 combinaisons. In another execution, I used only 2 values by param and the Prepare phase was fast, so I think it's really dependent on hyper-param configuration. Is it possible to create only the folders that are mentionned in the experiements file ? if I take your example. Why do you have to create 4 folders (0.1 & 5, 0.1 & 30, 0.5 & 5 and 0.5 & 30 ) when we only want two executions (0.1 & 5 and 0.5 & 30)? algorithms = [
Algorithm(
name="MyAlgorithm",
main=FunctionAdapter(my_algorithm),
data_as_file=False,
training_type=TrainingType.UNSUPERVISED,
input_dimensionality=InputDimensionality.UNIVARIATE,
param_config=FullParameterGrid({"lambda": [0.1, 0.5], "k": [5, 30]})
)
]
# Create exeuctions file
df = pd.DataFrame([
["MyAlgorithm", "test", "dataset-datetime", hash_dict({"lambda": 0.1, "k": 5})],
["MyAlgorithm", "test", "dataset-int", hash_dict({"lambda": 0.5, "k": 30})]
], columns=["algorithm", "collection", "dataset", "hyper_params_id"])
df.to_csv("experiments.csv", index=False)
timeeval = TimeEval(dm, datasets, algorithms,
experiment_combinations_file=Path.cwd() / "experiments.csv") |
Beta Was this translation helpful? Give feedback.
-
Yes, I confirm that I use a single algorithm through Docker image ( present in all workers ) and that everything happens quickly if I do not give too many values possible in the FullParamGrid. Could you give me the function that print this message 'Running Prepare Phase ' ? I want to inspect why I have this problem. How can I ensure that I am using the latest version of TimeEval (the master branch)? EDIT from typing import Dict, Any
from pathlib import Path
import numpy as np
import pandas as pd
from timeeval import TimeEval, DatasetManager, Algorithm, TrainingType, InputDimensionality
from timeeval.adapters import FunctionAdapter
from timeeval.params import FullParameterGrid
from timeeval.utils.hash_dict import hash_dict
from timeeval import MultiDatasetManager, Algorithm, TrainingType, InputDimensionality, ResourceConstraints
# Load dataset metadata
#dm = DatasetManager(Path.cwd() / "tests" / "example_data", create_if_missing=True)
dm = MultiDatasetManager([Path("/home/ubuntu/timeeval-datasets/")])
# Define algorithm
def my_algorithm(data: np.ndarray, args: Dict[str, Any]) -> np.ndarray:
print(f"Running Algo with lambda={args.get('lambda', None)} and k={args.get('k', None)}")
return np.random.rand(data.shape[0])
# Select datasets and algorithms
datasets = dm.select()
print(len(datasets))
# Add algorithms to evaluate...
algorithms = [
Algorithm(
name="MyAlgorithm",
main=FunctionAdapter(my_algorithm),
data_as_file=False,
training_type=TrainingType.UNSUPERVISED,
input_dimensionality=InputDimensionality.MULTIVARIATE,
param_config=FullParameterGrid({
"lambda": [0.1, 0.5,0.2,1,2,3,4,5,6,7,8,9,10,11,236,45,256,45,26,23,78],
"k": [5, 1234,569,789,30,1,2,3,5,6,812,45,23,111,89,45,1789]})
)
]
# Create exeuctions file
df = pd.DataFrame([
["MyAlgorithm","CalIt2","CalIt2-traffic", hash_dict({"lambda": 0.1, "k": 5})],
["MyAlgorithm","CalIt2","CalIt2-traffic", hash_dict({"lambda": 0.5, "k": 30})]
], columns=["algorithm", "collection", "dataset", "hyper_params_id"])
df.to_csv("experiments.csv", index=False)
timeeval = TimeEval(dm, datasets, algorithms,
experiment_combinations_file=Path("/home/ubuntu/TimeEval-test/experiments.csv"))
# execute evaluation
timeeval.run()
# retrieve results
print(timeeval.get_results(aggregated=False, short=False)) Again, the execution gets blocked in the preparation phase. |
Beta Was this translation helpful? Give feedback.
Can you successfully execute my MWP from my previous answer?
I suppose that your algorithm configuration does not match the experiments combination file. Please provide a full MWP. Your script misses the information about the algorithm configurations.