Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to reproduce iOS benchmark values #243

Closed
kanaukou-google opened this issue Aug 24, 2023 · 13 comments
Closed

Unable to reproduce iOS benchmark values #243

kanaukou-google opened this issue Aug 24, 2023 · 13 comments

Comments

@kanaukou-google
Copy link

I am unable to reproduce claimed benchmark values for running Stable Diffusion model on iOS devices.

Configuration

  • MPB M1 Max 64G RAM macOS 13.5
  • Xcode 15.0.0 beta 5
  • iPhone 14 Pro iOS 17 beta 4

Steps to reproduce:

  • Build and run the example app from the sources with the said Xcode version on the said device.
  • Within the app press 'Generate', on completion observe the processing time in the label below.
  • Doesn't matter it's a cold or a consecutive run, the latency never falls below 11s which is ~20% slower than the claimed benchmarks for iPhone 14 Pro Max.

Could anyone kindly clarify is there anything specific I need to adjust in my setup in order to reproduce the results?

  • Should I use a different example Swift app?
  • Should I use specific Xcode 15 / iOS 17 beta versions were used?
  • Is there a difference between iPhone 14 Pro and iPhone 14 Pro Max that results in such discrepancy?
@atiorh
Copy link
Collaborator

atiorh commented Aug 24, 2023

Hello! Your setup sounds accurate to me. Which version of Stable Diffusion are you benchmarking?

@kanaukou-google
Copy link
Author

I made no changes to the example app, and it seems to use stable-diffusion-2-1-base-palettized for my setup while benchmarks refer to stable-diffusion-2-1-base.

@atiorh
Copy link
Collaborator

atiorh commented Aug 25, 2023

That is also correct. A few things:

  • We have benchmarked with iOS17 Seed 1 and 7f9c58a (commit on main with 1.0.0 release) so if any performance number is off with Seed 5 and current main commit, it would be a regression we need to investigate
  • @pcuenca Do you mind testing with Seed 5 to see if your previous numbers on 14 Pro regress too?
  • @kanaukou-google Are you observing the 2.3 iter/sec or is that also lower? It will give us a sense of whether ML perf degraded or some non-ML code is slower for some yet unknown reason.

@kanaukou-google
Copy link
Author

@atiorh I added log above L117 to print the supposed iter/sec value, got values around ~2.72 for several consecutive runs.

@atiorh
Copy link
Collaborator

atiorh commented Aug 27, 2023

Hmm, that sounds even faster than what we published (2.72 vs 2.3 iter/sec) and it should have finished in ~8 seconds with that throughput. I will wait for Pedro to repro his measurements from June and also rerun our measurements on Seed 5 this week.

@atiorh
Copy link
Collaborator

atiorh commented Aug 27, 2023

@kanaukou-google Oh one more thing, could you please verify that reduceMemory is not enabled? It will add 1-2 seconds for loading/unloading resources during generation.

@pcuenca
Copy link
Contributor

pcuenca commented Aug 28, 2023

I no longer have access to the iPhone 14 Pro I used for the tests, but I repeated them on my iPhone 13 Pro running iOS 17 beta 7 (21A5319a). Some observations:

  • reduceMemory was indeed defaulting to true because of this test. This beta of iOS 17 reports 5917753344 bytes of physicalMemory. I knew this varies among devices, but this number is lower than the lower threshold I had set before.
  • The number of scheduling steps we used to benchmark is 20, whereas the app's default is 25.
  • The app now uses in-progress previews that we need to disable to replicate the original benchmark testing conditions.

To reduce ambiguity, I pushed this branch to replicate the benchmark conditions (to the best of my recollection) using the latest code.

Using that branch, I got the following results on 5 consecutive runs using Xcode 15.0 beta 7 on iPhone 13 Pro running iOS 17 beta 7 (21A5319a):

time 9.8 9.5 9.0 9.1 9.6
it/s 2.31 2.36 2.49 2.48 2.24

9.5 is faster than the original 12s observed for the same device back in June.

Also observe that I'm running tests after reboot, waiting for the device to cool, and detached from Xcode.

@pcuenca
Copy link
Contributor

pcuenca commented Aug 28, 2023

I can also repeat the tests on seed 5 of Xcode if that's useful.

@kanaukou-google
Copy link
Author

I checked out @pcuenca's PR and tried reproducing it on a cool rebooted 14 Pro beta 4 detached from Xcode.

Below are results for 5 consecutive runs (meaning 5 times pressed 'Generate' button after processing is complete without restarting the app or changing the prompt).

time 7.9 7.9 7.9 7.9 8.0
it/s 2.69 2.69 2.68 2.69 2.69

The results look even better than the benchmark values! I wonder if such a consistency of .1s for time and .01s for it/s values for 5 runs is expected, though?

@atiorh
Copy link
Collaborator

atiorh commented Aug 28, 2023

Thank you both for the time spent on this @pcuenca @kanaukou-google! Our inference stack is consistently improving! We will rerun our internal benchmarks with the latest public seed of iOS 17 and update our numbers.

@kanaukou-google
Copy link
Author

No problem, glad we figured this out! Just one quick question, what would be the best approach to approach the benchmarking in future? Looks like some changes from @pcuenca's PR need to be applied to get proper results.

@pcuenca
Copy link
Contributor

pcuenca commented Aug 28, 2023

That's a good point @kanaukou-google! I'll add a BENCHMARK constant to control the configuration, add an entry to the README and merge the PR. We'll still need to pay attention when we introduce features that may impact the benchmark code.

@atiorh atiorh closed this as completed in a56e102 Aug 30, 2023
@atiorh
Copy link
Collaborator

atiorh commented Aug 30, 2023

Updated the benchmarks with the latest public seed using @pcuenca 's benchmarking branch. Thanks @TBPer for the benchmarking runs!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants