Skip to content

Commit

Permalink
Merge branch 'prototype_hyper' of https://github.com/wandb/llm-leader…
Browse files Browse the repository at this point in the history
…board into prototype_hyper
  • Loading branch information
olachinkei committed Jun 30, 2024
2 parents 388b8f5 + 5a957e7 commit 4b75ac8
Show file tree
Hide file tree
Showing 3 changed files with 98 additions and 4 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ Below, you will find a detailed description of the variables utilized in the `ba
- **inference_interval:** Set inference interval in seconds. This is particularly effective when there are rate limits, such as with APIs.
- **run:** Set to true for each evaluation dataset you want to run.
- **model:** Information about the model.
- `use_wandb_artifacts`: Whether to use WandB artifacts for the model usage.
- `artifacts_path`: Path of the wandb artifacts where the model is located.
- `max_model_len`: Maximum token length of the input.
- `chat_template`: Path to the chat template file. This is required for open-weights models.
- `dtype`: Data type. Choose from float32, float16, bfloat16.
Expand Down Expand Up @@ -163,7 +163,7 @@ This framework supports evaluating models using APIs such as OpenAI, Anthropic,
- `run_name`: Name of the W&B run.
- **api:** Choose the API to use from `openai`, `anthropic`, `google`, `amazon_bedrock`.
- **batch_size:** Batch size for API calls (recommended: 32).
- **model:** Information about the model.
- **model:** Information about the model.
- `pretrained_model_name_or_path`: Name of the API model.
- `size_category`: Specify "api" to indicate using an API model.
- `size`: Model size (leave as null for API models).
Expand All @@ -179,7 +179,7 @@ This framework also supports evaluating models using VLLM. You need to create a
- **num_gpus:** Number of GPUs to use.
- **batch_size:** Batch size for VLLM (recommended: 256).
- **model:** Information about the model.
- `use_wandb_artifacts`: Set to true if you want to use wandb artifacts.
- `artifacts_path`: When loading a model from wandb artifacts, it is necessary to include a description. If not, there is no need to write it. Example notation: wandb-japan/llm-leaderboard/llm-jp-13b-instruct-lora-jaster-v1.0:v0
- `pretrained_model_name_or_path`: Name of the VLLM model.
- `chat_template`: Path to the chat template file (if needed).
- `size_category`: Specify model size category. In Nejumi Leaderboard, the category is defined as "10B<", "10B<= <30B", "<=30B" and "api".
Expand Down
91 changes: 91 additions & 0 deletions blend_run_configs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# Blender

This feature allows you to blend inference results later or resume a run by carrying over the previous results.

## Overview

The Blender feature provides two main functionalities:

1. **Blend Run**: Merges the results of multiple past runs to analyze the aggregated output.
2. **Resume Run**: Continues a previous run from where it left off, carrying over the results and configurations.

These features are particularly useful in scenarios where you need to combine results from different experiments or when an experiment is interrupted and needs to be resumed without losing the previous progress.

## Blend Run

### Setting up blend_config

1. Copy the template and configure it properly:
```bash
cp -ip blend_run_configs/config_template.yaml blend_run_configs/blend_config.yaml
```

2. The blend_config.yaml requires the following settings:
- `num_few_shots`: Number of few-shot examples to use.
- `model`: Information about the model.
- `new_run`: Information for the new run.
- `old_runs`: Specify the tasks you want to carry over from past runs.

For detailed information on each setting, please refer to the blender Configuration Settings

### Running the Blend Script

3. Run the blend script:
```bash
python3 scripts/blend_run.py
```

### Checking the W&B Dashboard

4. The results of the evaluation will be logged to the specified W&B project.

## Resume Run

### Setting up blend_config

1. Copy the template and configure it properly:
```bash
cp -ip blend_run_configs/config_template.yaml blend_run_configs/blend_config.yaml
```

2. The blend_config.yaml requires the following settings:
- `run_chain`: Set to `true`.
- `old_runs`: Specify the tasks you want to carry over from past runs.

For detailed information on each setting, please refer to the blender-configuration-settings.

### Running the Evaluation Script

3. You can use either the `-c` or `-s` option:
- **-c (config)**: Specify the config file by its name, e.g.,
```bash
python3 scripts/run_eval.py -c config-gpt-4o-2024-05-13.yaml
```
- **-s (select-config)**: Select from a list of available config files. This option is useful if you have multiple config files.
```bash
python3 scripts/run_eval.py -s
```

### Checking the W&B Dashboard

4. The results of the evaluation will be logged to the specified W&B project.

## Blender Configuration Settings

- **run_chain**: Set to `false` to use the blend feature and `true` to use the resume feature.
- **num_few_shots**: Number of few-shot examples to use.
- **model**: Information about the model. (No configuration is required to use the resume feature.)
- `use_wandb_artifacts`: Set to `true` if you want to use W&B artifacts.
- `pretrained_model_name_or_path`: Name of the VLLM model.
- `chat_template`: Path to the chat template file (if needed).
- `size_category`: Specify "api" to indicate using an API model.
- `size`: Model size (parameter).
- `release_date`: Model release date (MM/DD/YYYY).
- `max_model_len`: Maximum token length of the input (if needed).
- **new_run**: Information used for Weights & Biases (W&B) support. (No configuration is required to use the resume feature.)
- `entity`: Name of the W&B Entity.
- `project`: Name of the W&B Project.
- `run_name`: Name of the W&B run. Please set up the run name in a model-specific config.
- **old_run**: Specify the tasks you want to carry over from past runs. Multiple runs are permissible.
- `run_path`: Run path of the W&B old_run.
- `dataset`: The list of tasks to take over. Please comment out tasks that do not need to be taken over.
5 changes: 4 additions & 1 deletion scripts/vllm_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def start_vllm_server():
cfg = instance.config
run = instance.run

model_artifact_path = cfg.model.get("artifact_path", None)
model_artifact_path = cfg.model.get("artifacts_path", None)
if model_artifact_path is not None:
artifact = run.use_artifact(model_artifact_path, type='model')
artifact = Path(artifact.download())
Expand Down Expand Up @@ -49,11 +49,14 @@ def run_vllm_server():
"--max-model-len", str(cfg.model.max_model_len),
"--max-num-seqs", str(cfg.batch_size),
"--tensor-parallel-size", str(cfg.get("num_gpus", 1)),
"--device", cfg.model.device_map,
"--seed", "42",
"--uvicorn-log-level", "warning",
"--disable-log-stats",
"--disable-log-requests",
]
if cfg.model.trust_remote_code:
command.append("--trust-remote-code")

# subprocessでサーバーをバックグラウンドで実行
process = subprocess.Popen(command)
Expand Down

0 comments on commit 4b75ac8

Please sign in to comment.