Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates for metal / GPU and performance improves on Silicon Macs #295

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

jasonkneen
Copy link
Contributor

@jasonkneen jasonkneen commented Aug 13, 2024

Summary by Sourcery

Enhance performance on Silicon Macs by adding Metal support and updating default settings for video encoding and execution providers. Improve resource management and refactor code for better organization. Update documentation to reflect these changes.

New Features:

  • Introduce Metal support for improved performance on macOS devices, particularly Silicon Macs.

Enhancements:

  • Change default video encoder to 'libx265' and improve video quality by setting the default quality to 1.
  • Update default execution provider to 'coreml' for better performance on Apple devices.
  • Increase default maximum memory usage on macOS to 6GB and suggest 12 execution threads for better resource utilization.
  • Refactor image and video processing functions for better code organization and readability.
  • Improve webcam preview resolution to 1024x768 for better display quality.

Documentation:

  • Update README to reflect changes in execution provider usage, replacing 'coreml' with 'metal' for macOS devices.

Copy link
Contributor

sourcery-ai bot commented Aug 13, 2024

Reviewer's Guide by Sourcery

This pull request implements several updates for metal / GPU and performance improvements on Silicon Macs. The changes focus on optimizing the execution providers, adjusting default settings, and improving compatibility with Apple Silicon devices. Key modifications include forcing the use of CoreML as the execution provider, updating video processing methods, and enhancing GPU utilization for TensorFlow and PyTorch on macOS.

File-Level Changes

Files Changes
modules/core.py Force CoreML as the execution provider and remove support for other providers
modules/core.py Update default settings for video encoding, quality, and frame handling
README.md Implement Metal support for improved performance on macOS devices
modules/utilities.py Modify frame extraction process to use OpenCV instead of ffmpeg
modules/ui.py Increase webcam preview resolution and adjust frame rate
modules/processors/frame/face_swapper.py Update face swapper model to use non-fp16 version
modules/core.py Add configuration and testing for CoreML, TensorFlow with Metal, and PyTorch with MPS

Tips
  • Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
  • Continue your discussion with Sourcery by replying directly to review comments.
  • You can change your review settings at any time by accessing your dashboard:
    • Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
    • Change the review language;
  • You can always contact us if you have any questions or feedback.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jasonkneen - I've reviewed your changes - here's some feedback:

Overall Comments:

  • While the performance improvements for Silicon Macs are appreciated, consider maintaining better cross-platform compatibility. Some changes, like hardcoding CoreML as the execution provider, might negatively impact users on other platforms.
  • The switch from 'inswapper_128_fp16.onnx' to 'inswapper_128.onnx' needs more explanation. Please clarify the reasons for this change and any potential impacts on model performance or compatibility.
Here's what I looked at during the review
  • 🟡 General issues: 2 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟡 Complexity: 1 issue found
  • 🟡 Documentation: 2 issues found

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.

@@ -44,7 +45,19 @@ def detect_fps(target_path: str) -> float:

def extract_frames(target_path: str) -> None:
temp_directory_path = get_temp_directory_path(target_path)
run_ffmpeg(['-i', target_path, '-pix_fmt', 'rgb24', os.path.join(temp_directory_path, '%04d.png')])
cap = cv2.VideoCapture(target_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Benchmark OpenCV vs ffmpeg for frame extraction

The extract_frames() function has been rewritten to use OpenCV instead of ffmpeg. While OpenCV offers more flexibility for image processing, ffmpeg is generally very efficient for video operations. This change could potentially impact performance, especially for large videos. Consider benchmarking this new implementation against the original ffmpeg-based one to ensure there's no significant performance regression.

@@ -17,7 +17,7 @@

def pre_check() -> bool:
download_directory_path = resolve_relative_path('../models')
conditional_download(download_directory_path, ['https://huggingface.co/hacksider/deep-live-cam/blob/main/inswapper_128_fp16.onnx'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Evaluate performance impact of model change

The model has been changed from 'inswapper_128_fp16.onnx' to 'inswapper_128.onnx'. This likely represents a shift from a 16-bit floating-point model to a 32-bit one, which could improve accuracy but increase memory usage and potentially slow down inference times. Consider evaluating and documenting the performance impact of this change, and possibly provide options for users to choose between accuracy and speed.

def pre_check() -> bool:
    download_directory_path = resolve_relative_path('../models')
    models = [
        'inswapper_128.onnx',
        'inswapper_128_fp16.onnx'
    ]
    conditional_download(download_directory_path, [f'https://huggingface.co/hacksider/deep-live-cam/blob/main/{model}' for model in models])
    return True

@@ -74,19 +74,20 @@ python run.py --execution-provider coreml
```

### [](https://github.com/s0md3v/roop/wiki/2.-Acceleration#coreml-execution-provider-apple-legacy)CoreML Execution Provider (Apple Legacy)
Metal support has been added for improved performance on macOS devices.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (documentation): Consider making this statement more specific or removing if redundant.

The changes in the instructions already imply Metal support. You might want to provide more specific details about the performance improvements or remove this line if it doesn't add new information.

Suggested change
Metal support has been added for improved performance on macOS devices.
CoreML with Metal acceleration is now supported, offering significant performance enhancements on compatible macOS devices.


```

2. Usage in case the provider is available:

```
python run.py --execution-provider coreml
python run.py --execution-provider metal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (documentation): Clarify if 'coreml' is still a valid execution provider option.

The command has changed from 'coreml' to 'metal'. Is 'coreml' still a valid option, or has it been completely replaced by 'metal'? This information might be helpful for users transitioning to the new system.

@@ -66,88 +62,43 @@ def parse_args() -> None:
modules.globals.video_encoder = args.video_encoder
modules.globals.video_quality = args.video_quality
modules.globals.max_memory = args.max_memory
modules.globals.execution_providers = decode_execution_providers(args.execution_provider)
modules.globals.execution_providers = ['CoreMLExecutionProvider'] # Force CoreML
modules.globals.execution_threads = args.execution_threads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider refactoring to reduce complexity and improve maintainability.

The new code introduces increased complexity due to hardcoding the use of CoreML, reducing flexibility compared to the previous dynamic selection of execution providers. The code size has grown with additional logic for ONNX Runtime, TensorFlow, and PyTorch, making it harder to maintain. There are redundant checks and configurations, especially for TensorFlow and PyTorch, which may not be necessary if the focus is on ONNX Runtime with CoreML. The mixing of configuration logic with the main execution flow violates the separation of concerns principle, complicating readability and maintenance. Multiple try-except blocks for error handling clutter the code. Consider refactoring to maintain flexibility, reduce redundancy, and separate configuration logic from the main flow.

@snacsnoc
Copy link

The package customtkinter is still needed to launch the UI, otherwise:

python run.py --execution-provider coreml
Traceback (most recent call last):
  File "/Users/easto/deep-live-cam-tmp/Deep-Live-Cam/run.py", line 3, in <module>
    from modules import core
  File "/Users/easto/deep-live-cam-tmp/Deep-Live-Cam/modules/core.py", line 22, in <module>
    import modules.ui as ui
  File "/Users/easto/deep-live-cam-tmp/Deep-Live-Cam/modules/ui.py", line 3, in <module>
    import customtkinter as ctk
ModuleNotFoundError: No module named 'customtkinter'

@hvmzx
Copy link

hvmzx commented Aug 14, 2024

I have tried your PR on the latest commit, on an M2 Pro, and still get very low FPS. I don't hear my fans kick in and my GPU usage is pretty low, do you think there is any way to improve it ?

@snacsnoc
Copy link

snacsnoc commented Aug 14, 2024 via email

@@ -74,19 +74,20 @@ python run.py --execution-provider coreml
```

### [](https://github.com/s0md3v/roop/wiki/2.-Acceleration#coreml-execution-provider-apple-legacy)CoreML Execution Provider (Apple Legacy)
Metal support has been added for improved performance on macOS devices.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong section? this is listed under "Apple Legacy" when it should be apple silicon maybe?

@cdrage
Copy link

cdrage commented Aug 19, 2024

Checking activity monitor, this still uses the CPU only. Compared to nsfw-roop where it does use the GPU.

On Wed, Aug 14, 2024 at 03:13 hvmzx @.> wrote: I have tried your PR on the latest commit, on an M2 Pro, and still get very low FPS. I don't hear my fans kick in and my GPU usage is pretty low, do you think there is any way to improve it ? — Reply to this email directly, view it on GitHub <#295 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG436LKMRPJXGSTNP4PKH3ZRMUU7AVCNFSM6AAAAABMOZGEDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBYGM3DSMBZGA . You are receiving this because you commented.Message ID: @.>

Getting the same on mac M1

@snacsnoc
Copy link

snacsnoc commented Aug 21, 2024

@jasonkneen

Issue 1

Running fails since the detection sized has changed:

def get_face_analyser() -> Any:
    global FACE_ANALYSER

    if FACE_ANALYSER is None:
        FACE_ANALYSER = insightface.app.FaceAnalysis(name='buffalo_l', providers=modules.globals.execution_providers)
        FACE_ANALYSER.prepare(ctx_id=0, det_size=(1280, 720))
    return FACE_ANALYSER

Reverting back to

        FACE_ANALYSER.prepare(ctx_id=0, det_size=(640, 640))

works.
Error:

(venv) [easto@MacBook-Pro][/tmp/Deep-Live-Cam]$ python run.py --execution-provider coreml  --execution-threads 12
Frame processor face_enhancer not found
Downloading: 56.0kB [00:00, 213kB/s]                                                                                                                                                                       
ONNX Runtime version: 1.16.3
Available execution providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider']
Selected execution provider: CoreMLExecutionProvider (with CPU fallback for face detection)
TensorFlow devices: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
TensorFlow is using GPU (Metal)
PyTorch is using MPS (Metal Performance Shaders)
Frame processor face_enhancer not found
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/easto/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/easto/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/easto/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/easto/.insightface/models/buffalo_l/genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/easto/.insightface/models/buffalo_l/w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5
set det-size: (1280, 720)
2024-08-21 12:28:23.815556 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running CoreML_9447659792891585317_6 node. Name:'CoreMLExecutionProvider_CoreML_9447659792891585317_6_6' Status Message: Exception: /Users/cansik/git/private/onnxruntime-silicon/onnxruntime/onnxruntime/core/providers/coreml/model/model.mm:63 InlinedVector<int64_t> (anonymous namespace)::GetStaticOutputShape(gsl::span<const int64_t>, gsl::span<const int64_t>, const logging::Logger &) inferred_shape.size() == coreml_static_shape.size() was false. CoreML static output shape ({1,1,1,7200,1}) and inferred shape ({3200,1}) have different ranks.

Exception in Tkinter callback
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/[email protected]/3.11.9_1/Frameworks/Python.framework/Versions/3.11/lib/python3.11/tkinter/__init__.py", line 1967, in __call__
    return self.func(*args)
           ^^^^^^^^^^^^^^^^
  File "/private/tmp/Deep-Live-Cam/venv/lib/python3.11/site-packages/customtkinter/windows/widgets/ctk_button.py", line 554, in _clicked
    self._command()
  File "/private/tmp/Deep-Live-Cam/modules/ui.py", line 97, in <lambda>
    start_button = ctk.CTkButton(root, text='Start', cursor='hand2', command=lambda: select_output_path(start))
                                                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/tmp/Deep-Live-Cam/modules/ui.py", line 194, in select_output_path
    start()
  File "/private/tmp/Deep-Live-Cam/modules/core.py", line 135, in start
    if not frame_processor.pre_start():
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/tmp/Deep-Live-Cam/modules/processors/frame/face_swapper.py", line 28, in pre_start
    elif not get_one_face(cv2.imread(modules.globals.source_path)):
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/tmp/Deep-Live-Cam/modules/face_analyser.py", line 20, in get_one_face
    face = get_face_analyser().get(frame)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/tmp/Deep-Live-Cam/venv/lib/python3.11/site-packages/insightface/app/face_analysis.py", line 59, in get
    bboxes, kpss = self.det_model.detect(img,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/tmp/Deep-Live-Cam/venv/lib/python3.11/site-packages/insightface/model_zoo/retinaface.py", line 224, in detect
    scores_list, bboxes_list, kpss_list = self.forward(det_img, self.det_thresh)
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/tmp/Deep-Live-Cam/venv/lib/python3.11/site-packages/insightface/model_zoo/retinaface.py", line 152, in forward
    net_outs = self.session.run(self.output_names, {self.input_name : blob})
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/tmp/Deep-Live-Cam/venv/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running CoreML_9447659792891585317_6 node. Name:'CoreMLExecutionProvider_CoreML_9447659792891585317_6_6' Status Message: Exception: /Users/cansik/git/private/onnxruntime-silicon/onnxruntime/onnxruntime/core/providers/coreml/model/model.mm:63 InlinedVector<int64_t> (anonymous namespace)::GetStaticOutputShape(gsl::span<const int64_t>, gsl::span<const int64_t>, const logging::Logger &) inferred_shape.size() == coreml_static_shape.size() was false. CoreML static output shape ({1,1,1,7200,1}) and inferred shape ({3200,1}) have different ranks.

Issue 2

Additionally, nsfw is missing from modules/globals:

Traceback (most recent call last):
  File "/private/tmp/Deep-Live-Cam/run.py", line 6, in <module>
    core.run()
  File "/private/tmp/Deep-Live-Cam/modules/core.py", line 247, in run
    start()
  File "/private/tmp/Deep-Live-Cam/modules/core.py", line 154, in start
    if modules.globals.nsfw == False:
       ^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'modules.globals' has no attribute 'nsfw'

Adding nsfw = None fixes the error.

GPU Usage

Lastly, this still fails to use the GPU on Apple Silicon (MacBook Pro M2 Max):

Frame processor face_enhancer not found
ONNX Runtime version: 1.16.3
Available execution providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider']
Selected execution provider: CoreMLExecutionProvider (with CPU fallback for face detection)
TensorFlow devices: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
TensorFlow is using GPU (Metal)
PyTorch is using MPS (Metal Performance Shaders)
Frame processor face_enhancer not found
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/easto/.insightface/models/buffalo_l/1k3d68.onnx landmark_3d_68 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/easto/.insightface/models/buffalo_l/2d106det.onnx landmark_2d_106 ['None', 3, 192, 192] 0.0 1.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/easto/.insightface/models/buffalo_l/det_10g.onnx detection [1, 3, '?', '?'] 127.5 128.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/easto/.insightface/models/buffalo_l/genderage.onnx genderage ['None', 3, 96, 96] 0.0 1.0
Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
find model: /Users/easto/.insightface/models/buffalo_l/w600k_r50.onnx recognition ['None', 3, 112, 112] 127.5 127.5
set det-size: (640, 640)
[DLC.CORE] Creating temp resources...
[DLC.CORE] Extracting frames...
Frame processor face_enhancer not found
[DLC.FACE-SWAPPER] Progressing...
Processing:   0%|                                                                         | 0/1626 [00:00<?, ?frame/s, execution_providers=['CoreMLExecutionProvider'], execution_threads=12, max_memory=6]Applied providers: ['CoreMLExecutionProvider', 'CPUExecutionProvider'], with options: {'CPUExecutionProvider': {}, 'CoreMLExecutionProvider': {}}
inswapper-shape: [1, 3, 128, 128]
Processing:  63%|███████████████████████████████████████                       | 1023/1626 [01:21<00:46, 12.97frame/s, execution_providers=['CoreMLExecutionProvider'], execution_threads=12, max_memory=6]
Screenshot 2024-08-21 at 12 48 04

@cdrage
Copy link

cdrage commented Sep 19, 2024

@hacksider this should not be merged yet as GPU support does not work for mac (still uses CPU)

@jasonkneen
Copy link
Contributor Author

Mine uses metal and the GPU.

@cdrage
Copy link

cdrage commented Sep 20, 2024

Mine uses metal and the GPU.

Myself and the 5 other users in this PR all of us don't see GPU being used in the activity monitor and it's very slow. Even with your latest commit :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants