Skip to content

Commit

Permalink
Merge pull request #305 from mlcommons/rgat_policies
Browse files Browse the repository at this point in the history
Add rules for R-Gat
  • Loading branch information
mrmhodak authored Dec 18, 2024
2 parents b03e0e9 + f2771c9 commit b697192
Showing 1 changed file with 13 additions and 0 deletions.
13 changes: 13 additions & 0 deletions inference_rules.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,7 @@ Each sample has the following definition:
|SDXL |A pair of postive and negative prompts
|Llama2 |one sequence
|Mixtral-8x7B |one sequence
|RGAT |one node id
|Llama3.1-405B |one sequence
|===

Expand Down Expand Up @@ -261,6 +262,7 @@ The Datacenter suite includes the following benchmarks:
|Language |Text Generation (Question Answering, Math and Code Generation) |Mixtral-8x7B |OpenOrca (5k samples, max_seq_len=2048), GSM8K (5k samples of the train split, max_seq_len=2048), MBXP (5k samples, max_seq_len=2048) | 15000 | 99% of FP16 ((OpenOrca)rouge1=45.5989, (OpenOrca)rouge2=23.3526, (OpenOrca)rougeL=30.4608, (gsm8k)Accuracy=73.66, (mbxp)Accuracy=60.16). Additionally, for both cases the tokens per sample should be between than 90% and 110% of the reference (tokens_per_sample=144.84)| TTFT/TPOTfootnote:[For Mixtral-8x7B, 2 latency metrics are collected - time to first token (TTFT) which measures the latency of the first token, and time per output token (TPOT) which measures the average interval between all the tokens generated.]: 2000 ms/200 ms
|Commerce |Recommendation |DLRMv2 |Synthetic Multihot Criteo Dataset | 204800 |99% of FP32 and 99.9% of FP32 (AUC=80.31%) | 60 ms
|Generative |Text to image |SDXL |Subset of coco-2014 val | 5000 |FID range: [23.01085758, 23.95007626] and CLIP range: [31.68631873, 31.81331801] | 20 s
|Graph |Node classification |RGAT |IGBH | 788379 |99% of FP32 (72.86%) | N/A
|===

Each Datacenter benchmark *requires* the following scenarios:
Expand All @@ -274,6 +276,7 @@ Each Datacenter benchmark *requires* the following scenarios:
|Language |Question Answering |Server, Offline
|Commerce |Recommendation |Server, Offline
|Generative |Text to image |Server, Offline
|Graph |Node classification |Offline
|===

The Edge suite includes the following benchmarks:
Expand Down Expand Up @@ -563,6 +566,8 @@ Allow any lossless compression that will be suitable for production use.
In Server mode allow per-Query compression.
|Generative | Text to image | SDXL | No compression allowed.

|Graph | Node Classification | RGAT | No compression allowed.

|===

. Compression scheme needs pre-approval, at least two weeks before a submission deadline.
Expand Down Expand Up @@ -900,6 +905,12 @@ Q: Is it allowed to apply continuous batching (or dynamic batching) for auto-gen

A: Yes. Continuous batching is explained at a high level here: https://www.anyscale.com/blog/continuous-batching-llm-inference.

=== RGAT

Q: Is loading the node neighbors a timed operation?

A: Yes, this is the main operation of this benchmark

=== Audit

Q: What characteristics of my submission will make it more likely to be audited?
Expand Down Expand Up @@ -1042,6 +1053,7 @@ Datacenter systems must provide at least the following bandwidths from the netwo
|Language |Mixtral-8x7B |OpenOrca (5k samples, max_seq_len=2048), GSM8K (5k samples of the train split, max_seq_len=2048), MBXP (5k samples, max_seq_len=2048) | __num_inputs*max_seq_len*dtype_size__ | __2048*dtype_size__ | __throughput*2048*dtype_size__
|Commerce |DLRMv2 | 1TB Click Logs |__avg(num_pairs_per_sample)*(num_numerical_inputs*dtype_size~1~ +num_categorical_inputs*dtype_size~2~))__footnote:[Each DLRMv2 sample consists of up to 700 user-item pairs draw from the distribution specified in https://github.com/mlcommons/inference/blob/master/recommendation/dlrm/pytorch/tools/dist_quantile.txt[dist_quantile.txt].] |__270*(13*dtype_size~1~+26*dtype_size~2~)__ | __throughput*270*(13*dtype_size~1~+26*dtype_size~2~)__
|Generative |SDXL |Subset of coco-2014 val captions (max_prompt_len=77) | __num_inputs*max_prompt_len*dtype_size__ | __77*dtype_size__ | __throughput*77*dtype_size__
|Graph |RGAT |IGBH | negligible | negligible | __> 0__
|===
=== Egress Bandwidth

Expand All @@ -1059,4 +1071,5 @@ Datacenter systems must provide at least the following bandwidths from the outpu
|Language |Mixtral-8x7B |OpenOrca (5k samples, max_seq_len=2048), GSM8K (5k samples of the train split, max_seq_len=2048), MBXP (5k samples, max_seq_len=2048) | __max_output_len*dtype_size__ | __2048*dtype_size__ | __throughput*2048*dtype_size__
|Commerce |DLRMv2 |Synthetic Multihot Criteo Dataset | negligible | negligible | __> 0__
|Generative |SDXL |Subset of coco-2014 val captions (max_prompt_len=77) | __3,145,728*dtype_size__ | __throughput*3,145,728*dtype_size__ | __> 0__
|Graph |RGAT |IGBH | negligible | negligible | __> 0__
|===

0 comments on commit b697192

Please sign in to comment.