-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates for metal / GPU and performance improves on Silicon Macs #295
base: main
Are you sure you want to change the base?
Conversation
Reviewer's Guide by SourceryThis pull request implements several updates for metal / GPU and performance improvements on Silicon Macs. The changes focus on optimizing the execution providers, adjusting default settings, and improving compatibility with Apple Silicon devices. Key modifications include forcing the use of CoreML as the execution provider, updating video processing methods, and enhancing GPU utilization for TensorFlow and PyTorch on macOS. File-Level Changes
Tips
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @jasonkneen - I've reviewed your changes - here's some feedback:
Overall Comments:
- While the performance improvements for Silicon Macs are appreciated, consider maintaining better cross-platform compatibility. Some changes, like hardcoding CoreML as the execution provider, might negatively impact users on other platforms.
- The switch from 'inswapper_128_fp16.onnx' to 'inswapper_128.onnx' needs more explanation. Please clarify the reasons for this change and any potential impacts on model performance or compatibility.
Here's what I looked at during the review
- 🟡 General issues: 2 issues found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟡 Complexity: 1 issue found
- 🟡 Documentation: 2 issues found
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.
@@ -44,7 +45,19 @@ def detect_fps(target_path: str) -> float: | |||
|
|||
def extract_frames(target_path: str) -> None: | |||
temp_directory_path = get_temp_directory_path(target_path) | |||
run_ffmpeg(['-i', target_path, '-pix_fmt', 'rgb24', os.path.join(temp_directory_path, '%04d.png')]) | |||
cap = cv2.VideoCapture(target_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (performance): Benchmark OpenCV vs ffmpeg for frame extraction
The extract_frames()
function has been rewritten to use OpenCV instead of ffmpeg. While OpenCV offers more flexibility for image processing, ffmpeg is generally very efficient for video operations. This change could potentially impact performance, especially for large videos. Consider benchmarking this new implementation against the original ffmpeg-based one to ensure there's no significant performance regression.
@@ -17,7 +17,7 @@ | |||
|
|||
def pre_check() -> bool: | |||
download_directory_path = resolve_relative_path('../models') | |||
conditional_download(download_directory_path, ['https://huggingface.co/hacksider/deep-live-cam/blob/main/inswapper_128_fp16.onnx']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (performance): Evaluate performance impact of model change
The model has been changed from 'inswapper_128_fp16.onnx' to 'inswapper_128.onnx'. This likely represents a shift from a 16-bit floating-point model to a 32-bit one, which could improve accuracy but increase memory usage and potentially slow down inference times. Consider evaluating and documenting the performance impact of this change, and possibly provide options for users to choose between accuracy and speed.
def pre_check() -> bool:
download_directory_path = resolve_relative_path('../models')
models = [
'inswapper_128.onnx',
'inswapper_128_fp16.onnx'
]
conditional_download(download_directory_path, [f'https://huggingface.co/hacksider/deep-live-cam/blob/main/{model}' for model in models])
return True
@@ -74,19 +74,20 @@ python run.py --execution-provider coreml | |||
``` | |||
|
|||
### [](https://github.com/s0md3v/roop/wiki/2.-Acceleration#coreml-execution-provider-apple-legacy)CoreML Execution Provider (Apple Legacy) | |||
Metal support has been added for improved performance on macOS devices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (documentation): Consider making this statement more specific or removing if redundant.
The changes in the instructions already imply Metal support. You might want to provide more specific details about the performance improvements or remove this line if it doesn't add new information.
Metal support has been added for improved performance on macOS devices. | |
CoreML with Metal acceleration is now supported, offering significant performance enhancements on compatible macOS devices. |
|
||
``` | ||
|
||
2. Usage in case the provider is available: | ||
|
||
``` | ||
python run.py --execution-provider coreml | ||
python run.py --execution-provider metal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (documentation): Clarify if 'coreml' is still a valid execution provider option.
The command has changed from 'coreml' to 'metal'. Is 'coreml' still a valid option, or has it been completely replaced by 'metal'? This information might be helpful for users transitioning to the new system.
@@ -66,88 +62,43 @@ def parse_args() -> None: | |||
modules.globals.video_encoder = args.video_encoder | |||
modules.globals.video_quality = args.video_quality | |||
modules.globals.max_memory = args.max_memory | |||
modules.globals.execution_providers = decode_execution_providers(args.execution_provider) | |||
modules.globals.execution_providers = ['CoreMLExecutionProvider'] # Force CoreML | |||
modules.globals.execution_threads = args.execution_threads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider refactoring to reduce complexity and improve maintainability.
The new code introduces increased complexity due to hardcoding the use of CoreML, reducing flexibility compared to the previous dynamic selection of execution providers. The code size has grown with additional logic for ONNX Runtime, TensorFlow, and PyTorch, making it harder to maintain. There are redundant checks and configurations, especially for TensorFlow and PyTorch, which may not be necessary if the focus is on ONNX Runtime with CoreML. The mixing of configuration logic with the main execution flow violates the separation of concerns principle, complicating readability and maintenance. Multiple try-except blocks for error handling clutter the code. Consider refactoring to maintain flexibility, reduce redundancy, and separate configuration logic from the main flow.
The package
|
I have tried your PR on the latest commit, on an M2 Pro, and still get very low FPS. I don't hear my fans kick in and my GPU usage is pretty low, do you think there is any way to improve it ? |
Checking activity monitor, this still uses the CPU only. Compared to
nsfw-roop where it does use the GPU.
…On Wed, Aug 14, 2024 at 03:13 hvmzx ***@***.***> wrote:
I have tried your PR on the latest commit, on an M2 Pro, and still get
very low FPS. I don't hear my fans kick in and my GPU usage is pretty low,
do you think there is any way to improve it ?
—
Reply to this email directly, view it on GitHub
<#295 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAG436LKMRPJXGSTNP4PKH3ZRMUU7AVCNFSM6AAAAABMOZGEDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBYGM3DSMBZGA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
@@ -74,19 +74,20 @@ python run.py --execution-provider coreml | |||
``` | |||
|
|||
### [](https://github.com/s0md3v/roop/wiki/2.-Acceleration#coreml-execution-provider-apple-legacy)CoreML Execution Provider (Apple Legacy) | |||
Metal support has been added for improved performance on macOS devices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong section? this is listed under "Apple Legacy" when it should be apple silicon maybe?
Getting the same on mac M1 |
Issue 1Running fails since the detection sized has changed:
Reverting back to
works.
Issue 2Additionally,
Adding GPU UsageLastly, this still fails to use the GPU on Apple Silicon (MacBook Pro M2 Max):
|
@hacksider this should not be merged yet as GPU support does not work for mac (still uses CPU) |
Mine uses metal and the GPU. |
Myself and the 5 other users in this PR all of us don't see GPU being used in the activity monitor and it's very slow. Even with your latest commit :( |
Summary by Sourcery
Enhance performance on Silicon Macs by adding Metal support and updating default settings for video encoding and execution providers. Improve resource management and refactor code for better organization. Update documentation to reflect these changes.
New Features:
Enhancements:
Documentation: