-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does end_on_device make sense? #213
Comments
Thinking about real applications, start_on_device makes sense because you can imagine streaming images or text or other inputs directly from the network in a real application. There's some subtlety in that real application would probably stream compressed images and decompression might have use the CPU. For end_on_device to make sense, there would have to be real applications were the output of an inference streams directly out to the network, but for most real applications, inference is not the last step in a pipeline before sending data back to a user. |
I agree inference is rarely the last pipeline step. However, if your accelerator is a general purpose programmable device, it's realistic for it to run post-processing too - for example, in the current proposal for 3D UNet where overlapping 128x128x128 tiles are recombined into a full segmented image, that would best be done on the accelerator. (This is mainly an issue for segmentation workloads, since those are the ones with large data outputs.) And then that combined image might indeed go straight to network. |
For 3D-UNet, didn't we agree that the server case made no sense, that's why it is offline only? I think the start_on_device rule is getting unwieldy to enforce. We should just move to injecting queries over the network, then submitters that implement NIC to accelerator DMAs will be able to measure the benefit directory. |
The timeline for getting that into 1.1 seems quite short, given there's no proposal yet. |
And regarding 3DUNet not being in server... that's correct, but latency is still relevant for 3DUNet in Edge Single Stream. |
Is this issue 3D-UNet specific? |
It's significant only for benchmarks where the output size is large. Today, that's only segmentation. |
Can we get an opinion from the medical advisory board? |
I'm sure we can, but what are we looking for, and how would we act on it? MLPerf has, historically, set some fairly arbitrary bounds on the timed portion of the benchmark. One thing we could meaningfully ask is for them to suggest what should be timed, and then address the question "what does the post- processing after the timed portion look like for this model?" Then there are three cases:
In case (1), end-from-device can use the same rules as start-from-device. Case (2) is straightforwardly "no". Case (3), which I expect is going to be true in at least some use cases, requires us to make rules to determine whether an accelerator can do the post-processing. Given that the biggest difficulty we struggled with in 1.0 was the tension between rules that are simple to arbitrate, and rules that don't force costs on submitters that they wouldn't incur in production, this doesn't seem like it will help. How else could we frame the question to the board? |
Do we need to make a rule? We should just add the post-processing to the benchmark. Then submitters can implement the post-processing on the host or device depending on the capabilities of their system. |
That would be my preference regardless of this question. If we do that, does that mean we should assume the answer is (1) above? |
I think we should ask the Medical Advisory Board:
My current thinking is that end_on_host is probably appropriate for 3D-UNet if we add timed post-processing, but I would like to have confirmation from an expert. Unlike most of the other application in MLPerf, there's not a good 3D-UNet analogue at Google, so I am very reluctant to make changes without consulting an expert. |
For clarification purposes I want to share here (thanks @pabloribalta) what the current reference implementation for 3D-Unet in the Training Benchmark is:
Notes:
We believe that steps 1-5 should not count against benchmark time and steps 6-8 could be left to the submitters to optimize (i.e. change the tile size) as long as they meet the expected accuracy or above. Indeed step 8 can probably take place on an accelerator. However it is to my understanding that the Inference closed division doesn't allow hyperparameter change. So submitters have to go with 128x128x128. Is this true? @tjablin the resulted stiched output may be either displayed on the screen, or sent to the network or stored for later view. |
The hyperparameter question is somewhat off-topic here: I've opened a new issue #216 |
Update rules to allow end-on-device for 3DUNet as per discussion in mlcommons#213.
The rationale for start_from_device is that submissions should not need to incur the overhead of transfer from system DRAM if there is a mechanism whereby network inputs can be delivered directly into accelerator memory.
Is end_on_device symmetric in this regard - e.g. submitters should not have to incur an overhead for transfer to system DRAM if the accelerator has the equivalent outbound capability?
@tjablin opinion?
The text was updated successfully, but these errors were encountered: