Skip to content

Commit

Permalink
Documented openorca download recipe
Browse files Browse the repository at this point in the history
  • Loading branch information
Akshat-Tripathi committed Oct 24, 2024
1 parent 042dd3a commit c08f9f8
Showing 1 changed file with 15 additions and 0 deletions.
15 changes: 15 additions & 0 deletions dataset_openorca_mlperf_recipe/docs_axs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
## Usage

This recipe downloads the OpenOrca dataset used for Llama2-70b inference submissions

### Command Anatomy
Only run the 2nd producer rule.

```bash
axs byquery downloaded,dataset_name=openorca,
model_family=llama2, # The model family to use - Note more changes will need to be made for this to properly work with different model families. The reason for this is that the input column has llama2 specific tags, which must be converted.
variant=7b,
total_samples=24576 # The total number of samples to convert
```
### Misc
Make sure you've downloaded the relevant tokeniser first.

0 comments on commit c08f9f8

Please sign in to comment.