Merge pull request #124 from automl/development

Development
automl · Sep 6, 2021 · 8c0372a · 8c0372a
2 parents 1732571 + 15c76af
commit 8c0372a
Show file tree

Hide file tree

Showing 71 changed files with 4,446 additions and 939 deletions.
diff --git a/.github/workflows/run_singularity_versions.yml b/.github/workflows/run_singularity_versions.yml
@@ -24,6 +24,11 @@ jobs:
             RUN_CONTAINER_EXAMPLES: true
             USE_SINGULARITY: false
             SINGULARITY_VERSION: "3.7"
+          - python-version: 3.7
+            DISPLAY_NAME: "Singularity Container Examples with S3.8"
+            RUN_CONTAINER_EXAMPLES: true
+            USE_SINGULARITY: false
+            SINGULARITY_VERSION: "3.8"
 
       fail-fast: false
 

diff --git a/.gitignore b/.gitignore
@@ -130,4 +130,10 @@ dmypy.json
 
 # Misc
 .idea/
-experiments/
+experiments/
+.DS_Store
+
+# Vagrant
+.vagrant
+Vagrantfile
+/hpobench/container/recipes_local/
diff --git a/README.md b/README.md
@@ -1,15 +1,22 @@
 # HPOBench
 
-HPOBench is a library for hyperparameter optimization and black-box optimization benchmark with a focus on reproducibility.
+HPOBench is a library for providing benchmarks for (multi-fidelity) hyperparameter optimization and with a focus on reproducibility.
 
-**Note:** HPOBench is under active construction. Stay tuned for more benchmarks. Information on how to contribute a new benchmark will follow shortly.
+A list of benchmarks can be found in the [wiki](https://github.com/automl/HPOBench/wiki/Available-Containerized-Benchmarks) and a guide on howto contribute benchmarks is avaiable [here](https://github.com/automl/HPOBench/wiki/https://github.com/automl/HPOBench/wiki/How-to-add-a-new-benchmark-step-by-step)
 
-**Note:** If you are looking for a different or older version of our benchmarking library, you might be looking for
- [HPOlib1.5](https://github.com/automl/HPOlib1.5) 
+## Status
+
+Status for Master Branch: 
+[![Build Status](https://github.com/automl/HPOBench/workflows/Test%20Pull%20Requests/badge.svg?branch=master)](https://https://github.com/automl/HPOBench/actions)
+[![codecov](https://codecov.io/gh/automl/HPOBench/branch/master/graph/badge.svg)](https://codecov.io/gh/automl/HPOBench)
+
+Status for Development Branch: 
+[![Build Status](https://github.com/automl/HPOBench/workflows/Test%20Pull%20Requests/badge.svg?branch=development)](https://https://github.com/automl/HPOBench/actions)
+[![codecov](https://codecov.io/gh/automl/HPOBench/branch/development/graph/badge.svg)](https://codecov.io/gh/automl/HPOBench)
 
 ## In 4 lines of code
 
-Run a random configuration within a singularity container
+Evaluate a random configuration using a singularity container
 ```python
 from hpobench.container.benchmarks.ml.xgboost_benchmark import XGBoostBenchmark
 b = XGBoostBenchmark(task_id=167149, container_source='library://phmueller/automl', rng=1)
@@ -27,79 +34,45 @@ result_dict = b.objective_function(configuration=config, fidelity={"n_estimators
 result_dict = b.objective_function(configuration=config, rng=1)
 ```
 
-Containerized benchmarks do not rely on external dependencies and thus do not change. To do so, we rely on [Singularity (version 3.5)](https://sylabs.io/guides/3.5/user-guide/).
-
-Further requirements are: [ConfigSpace](https://github.com/automl/ConfigSpace), *scipy* and *numpy* 
-
-**Note:** Each benchmark can also be run locally, but the dependencies must be installed manually and might conflict with other benchmarks. 
- This can be arbitrarily complex and further information can be found in the docstring of the benchmark.
-
-A simple example is the XGBoost benchmark which can be installed with `pip install .[xgboost]`
-```python
-from hpobench.benchmarks.ml.xgboost_benchmark import XGBoostBenchmark
-b = XGBoostBenchmark(task_id=167149)
-config = b.get_configuration_space(seed=1).sample_configuration()
-result_dict = b.objective_function(configuration=config, fidelity={"n_estimators": 128, "dataset_fraction": 0.5}, rng=1)
-
-```
+For more examples see `/example/`.
 
 ## Installation
 
-Before we start, we recommend using a virtual environment. To run any benchmark using its singularity container, 
-run the following:
+We recommend using a virtual environment. To install HPOBench, please run the following:
 ```
 git clone https://github.com/automl/HPOBench.git
 cd HPOBench 
 pip install .
 ```
 
-**Note:** This does not install *singularity (version 3.5)*. Please follow the steps described here: [user-guide](https://sylabs.io/guides/3.5/user-guide/quick_start.html#quick-installation-steps).   
+**Note:** This does not install *singularity (version 3.6)*. Please follow the steps described here: [user-guide](https://sylabs.io/guides/3.6/user-guide/quick_start.html#quick-installation-steps).   
+If you run into problems, using the most recent singularity version might help: [here](https://singularity.hpcng.org/admin-docs/master/installation.html)
 
-## Available Containerized Benchmarks
+## Containerized Benchmarks
 
-| Benchmark Name                    | Container Name     | Additional Info                      |
-| :-------------------------------- | ------------------ | ------------------------------------ |
-| BNNOn*                            | pybnn              | There are 4 benchmark in total (ToyFunction, BostonHousing, ProteinStructure, YearPrediction) |
-| CartpoleFull                      | cartpole           | Not deterministic.                    |
-| CartpoleReduced                   | cartpole           | Not deterministic.                    |
-| SliceLocalizationBenchmark        | tabular_benchmarks | Loading may take several minutes.     |
-| ProteinStructureBenchmark         | tabular_benchmarks | Loading may take several minutes.     |
-| NavalPropulsionBenchmark          | tabular_benchmarks | Loading may take several minutes.     |
-| ParkinsonsTelemonitoringBenchmark | tabular_benchmarks | Loading may take several minutes.     |
-| NASCifar10*Benchmark              | nasbench_101       | Loading may take several minutes. There are 3 benchmark in total (A, B, C) |
-| *NasBench201Benchmark             | nasbench_201       | Loading may take several minutes. There are 3 benchmarks in total (Cifar10Valid, Cifar100, ImageNet)    |
-| NASBench1shot1SearchSpace*Benchmark | nasbench_1shot1  | Loading may take several minutes. There are 3 benchmarks in total (1,2,3) |
-| ParamNet*OnStepsBenchmark       | paramnet         | There are 6 benchmarks in total (Adult, Higgs, Letter, Mnist, Optdigits, Poker) |
-| ParamNet*OnTimeBenchmark        | paramnet         | There are 6 benchmarks in total (Adult, Higgs, Letter, Mnist, Optdigits, Poker) |
-| SurrogateSVMBenchmark              | surrogate_svm      | Random Forest Surrogate of a SVM on MNIST | 
-| Learna⁺                            | learna_benchmark   | Not deterministic.                    |
-| MetaLearna⁺                        | learna_benchmark   | Not deterministic.                    |
-| XGBoostBenchmark⁺                  | xgboost_benchmark  | Works with OpenML task ids. |
-| XGBoostExtendedBenchmark⁺          | xgboost_benchmark  | Works with OpenML task ids + Contains Additional Parameter `Booster |
-| SupportVectorMachine⁺              | svm_benchmark      | Works with OpenML task ids. |
+We provide all benchmarks as containerized versions to (i) isolate their dependencies and (ii) keep them reproducible. Our containerized benchmarks do not rely on external dependencies and thus do not change over time. For this, we rely on [Singularity (version 3.6)](https://sylabs.io/guides/3.6/user-guide/) and for now upload all containers to a [gitlab registry](https://gitlab.tf.uni-freiburg.de/muelleph/hpobench-registry/container_registry)
 
-⁺ these benchmarks are not yet final and might change
+The only other requirements are: [ConfigSpace](https://github.com/automl/ConfigSpace), *scipy* and *numpy* 
 
-**Note:** All containers are uploaded [here](https://gitlab.tf.uni-freiburg.de/muelleph/hpobench-registry/container_registry)
+### Run a Benchmark Locally
 
-## Further Notes
-
-### Configure the HPOBench
+Each benchmark can also be run locally, but the dependencies must be installed manually and might conflict with other benchmarks. This can be arbitrarily complex and further information can be found in the docstring of the benchmark.
+ 
+A simple example is the XGBoost benchmark which can be installed with `pip install .[xgboost]`
 
-All of HPOBench's settings are stored in a file, the `hpobenchrc`-file. 
-It is a yaml file, which is automatically generated at the first use of HPOBench. 
-By default, it is placed in `$XDG_CONFIG_HOME`. If `$XDG_CONFIG_HOME` is not set, then the
-`hpobenchrc`-file is saved to `'~/.config/hpobench'`. When using the containerized benchmarks, the Unix socket is 
-defined via `$TEMP_DIR`. This is by default `\tmp`. Make sure to have write permissions in those directories. 
+```python
+from hpobench.benchmarks.ml.xgboost_benchmark_old import XGBoostBenchmark
 
-In the `hpobenchrc`, you can specify for example the directory, in that the benchmark containers are
-downloaded. We encourage you to take a look into the `hpobenchrc`, to find out more about all
-possible settings. 
+b = XGBoostBenchmark(task_id=167149)
+config = b.get_configuration_space(seed=1).sample_configuration()
+result_dict = b.objective_function(configuration=config,
+                                   fidelity={"n_estimators": 128, "dataset_fraction": 0.5}, rng=1)
 
+```
 
-### How to build a container locally
+### How to Build a Container Locally
 
-With singularity installed run the following to built the xgboost container
+With singularity installed run the following to built the, e.g. xgboost container
 
 ```bash
 cd hpobench/container/recipes/ml
@@ -116,18 +89,23 @@ config = b.get_configuration_space(seed=1).sample_configuration()
 result_dict = b.objective_function(config, fidelity={"n_estimators": 128, "dataset_fraction": 0.5})
 ```
 
+## Configure HPOBench
+
+All of HPOBench's settings are stored in a file, the `hpobenchrc`-file. It is a .yaml file, which is automatically generated at the first use of HPOBench. 
+By default, it is placed in `$XDG_CONFIG_HOME` (or if not set this defaults to `'~/.config/hpobench'`). This file defines where to store containers and datasets and much more. We highly recommend to have a look at this file once it's created. Furthermore, please make sure to have write permission in these directories or adapt if necessary. For more information on where data is stored, please see the section on `HPOBench Data` below.
+
+Furthermore, for running containers, we rely on Unix sockets which by default are located in `$TEMP_DIR` (or if not set this defaults to `\tmp`). 
+
 ### Remove all data, containers, and caches
 
-Update: In version 0.0.8, we have added the script `hpobench/util/clean_up_script.py`. It allows to easily remove all
-data, downloaded containers, and caches. To get more information, you can use the following command. 
+Feel free to use `hpobench/util/clean_up_script.py` to remove all data, downloaded containers and caches:
 ```bash
 python ./hpobench/util/clean_up_script.py --help
 ``` 
 
-If you like to delete only specific parts, i.e. a single container,
-you can find the benchmark's data, container, and caches in the following directories:
+If you like to delete only specific parts, i.e. a single container, you can find the benchmark's data, container, and caches in the following directories:
 
-#### HPOBench data
+#### HPOBench Data
 HPOBench stores downloaded containers and datasets at the following locations:
 
 ```bash
@@ -138,20 +116,20 @@ $XDG_DATA_HOME # ~/.local/share/hpobench
 
 For crashes or when not properly shutting down containers, there might be socket files left under `/tmp/hpobench_socket`.
 
-#### OpenML data
+#### OpenML Data
 
 OpenML data additionally maintains its cache which is located at `~/.openml/`
 
-#### Singularity container
+#### Singularity Containers
 
 Singularity additionally maintains its cache which can be removed with `singularity cache clean`
 
-### Use HPOBench benchmarks in research projects
+### Use HPOBench Benchmarks in Research Projects
 
 If you use a benchmark in your experiments, please specify the version number of the HPOBench as well as the version of 
-the used container. When starting an experiment, HPOBench writes automatically the 2 version numbers to the log. 
+the used container to ensure reproducibility. When starting an experiment, HPOBench writes automatically these two version numbers to the log. 
 
-### Troubleshooting
+### Troubleshooting and Further Notes
 
   - **Singularity throws an 'Invalid Image format' exception**
   Use a singularity version > 3. For users of the Meta-Cluster in Freiburg, you have to set the following path:
@@ -160,13 +138,6 @@ the used container. When starting an experiment, HPOBench writes automatically t
   - **A Benchmark fails with `SystemError: Could not start an instance of the benchmark. Retried 5 times` but the container 
 can be started locally with `singularity instance start <pathtocontainer> test`**
 See whether in `~/.singularity/instances/sing/$HOSTNAME/*/` there is a file that does not end with '}'. If yes delete this file and retry.   
-
-## Status
-
-Status for Master Branch: 
-[![Build Status](https://github.com/automl/HPOBench/workflows/Test%20Pull%20Requests/badge.svg?branch=master)](https://https://github.com/automl/HPOBench/actions)
-[![codecov](https://codecov.io/gh/automl/HPOBench/branch/master/graph/badge.svg)](https://codecov.io/gh/automl/HPOBench)
 
-Status for Development Branch: 
-[![Build Status](https://github.com/automl/HPOBench/workflows/Test%20Pull%20Requests/badge.svg?branch=development)](https://https://github.com/automl/HPOBench/actions)
-[![codecov](https://codecov.io/gh/automl/HPOBench/branch/development/graph/badge.svg)](https://codecov.io/gh/automl/HPOBench)
+**Note:** If you are looking for a different or older version of our benchmarking library, you might be looking for
+ [HPOlib1.5](https://github.com/automl/HPOlib1.5) 
diff --git a/changelog.md b/changelog.md
@@ -1,3 +1,14 @@
+# 0.0.9
+  * Add new Benchmarks: Tabular Benchmarks.
+    Provided by @Neeratyoy. 
+  * New Benchmark: ML Benchmark Class
+    This new benchmark class offers a unified interface for XGB, SVM, MLP, HISTGB, RF, LR benchmarks operating on OpenML 
+    tasks.
+    Provided by @Neeratyoy.  
+  * This version is the used for the paper: 
+    "HPOBench: A Collection of Reproducible Multi-Fidelity Benchmark Problems for HPO" (Eggensperger et al.)    
+    https://openreview.net/forum?id=1k4rJYEwda-
+
 # 0.0.8
   * Improve container integration
     The containers had some problems when the file system was read-only. In this case, the home directory, which contains the 

diff --git a/ci_scripts/install.sh b/ci_scripts/install.sh
@@ -4,7 +4,7 @@ install_packages=""
 
 if [[ "$RUN_TESTS" == "true" ]]; then
     echo "Install tools for testing"
-    install_packages="${install_packages}xgboost,pytest,test_paramnet,"
+    install_packages="${install_packages}xgboost,pytest,test_paramnet,test_tabular_datamanager,"
     pip install codecov
 
     # The param net benchmark does not work with a scikit-learn version != 0.23.2. (See notes in the benchmark)
@@ -65,7 +65,7 @@ if [[ "$USE_SINGULARITY" == "true" ]]; then
       sudo make -C builddir install
 
     cd ..
-    install_packages="${install_packages}singularity,"
+    install_packages="${install_packages}placeholder,"
 else
     echo "Skip installing Singularity"
 fi

diff --git a/ci_scripts/install_singularity.sh b/ci_scripts/install_singularity.sh
@@ -20,6 +20,8 @@ elif [[ "$SINGULARITY_VERSION" == "3.6" ]]; then
     export VERSION=3.6.4
 elif [[ "$SINGULARITY_VERSION" == "3.7" ]]; then
     export VERSION=3.7.3
+elif [[ "$SINGULARITY_VERSION" == "3.8" ]]; then
+    export VERSION=3.8.0
 else
     echo "Skip installing Singularity"
 fi

diff --git a/examples/local/xgboost_local.py b/examples/local/xgboost_local.py
@@ -10,7 +10,7 @@
 import argparse
 from time import time
 
-from hpobench.benchmarks.ml.xgboost_benchmark import XGBoostBenchmark as Benchmark
+from hpobench.benchmarks.ml.xgboost_benchmark_old import XGBoostBenchmark as Benchmark
 from hpobench.util.openml_data_manager import get_openmlcc18_taskids
 
 

diff --git a/extra_requirements/ml_mfbb.json b/extra_requirements/ml_mfbb.json
@@ -0,0 +1,4 @@
+{
+  "ml_tabular_benchmarks": ["tqdm","pandas==1.2.4","scikit-learn==0.24.2","openml==0.12.2","xgboost==1.3.1"],
+  "ml_mfbb": ["tqdm","pandas==1.2.4","scikit-learn==0.24.2","openml==0.12.2","xgboost==1.3.1"]
+}
diff --git a/extra_requirements/outlier_detection.json b/extra_requirements/outlier_detection.json
@@ -0,0 +1,3 @@
+{
+  "outlier_detection": ["torch==1.9.0", "pytorch_lightning==1.3.8", "scikit-learn==0.24.2"]
+}
diff --git a/extra_requirements/tests.json b/extra_requirements/tests.json
@@ -1,5 +1,6 @@
 {
   "codestyle": ["pycodestyle","flake8","pylint"],
   "pytest": ["pytest>=4.6","pytest-cov"],
-  "test_paramnet": ["tqdm", "scikit-learn==0.23.2"]
+  "test_paramnet": ["tqdm", "scikit-learn==0.23.2"],
+  "test_tabular_datamanager": ["pyarrow", "fastparquet"]
 }