Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-GPU Training Support #42

Merged
merged 19 commits into from
Jul 11, 2024
Merged

Multi-GPU Training Support #42

merged 19 commits into from
Jul 11, 2024

Conversation

MatejRojec
Copy link
Contributor

  • Modify the backbone models to support multi-GPU training using the DDP (Distributed Data Parallel) strategy.
  • Resolve the ddp_find_unused_parameters_true error, which occurs when some backbone components are not utilized in the forward pass.

Copy link

github-actions bot commented Jun 19, 2024

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
4858 3717 77% 0% 🟢

New Files

No new covered files...

Modified Files

File Coverage Status
luxonis_train/nodes/efficientnet.py 30% 🟢
luxonis_train/nodes/mobilenetv2.py 33% 🟢
luxonis_train/nodes/mobileone.py 12% 🟢
luxonis_train/nodes/resnet.py 96% 🟢
luxonis_train/nodes/rexnetv1.py 15% 🟢
TOTAL 38% 🟢

updated for commit: e30fb37 by action🐍

Copy link
Collaborator

@kozlov721 kozlov721 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

x = self.backbone.conv_head(x)
x = self.backbone.bn2(x)
x = self.backbone.act2(x)
x = self.backbone.global_pool(x)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we append flattened features here instead of before global pool?
It seems also we have a list of size 5?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whereas elsewhere it's a list of size 4?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think it should be added before global pool.
Regarding the list size of 4, would it make sense to remove one of the out_indices for example 2, so we have a list size of 4?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe we check the paper if they define "blocks" to find out which features we should take?

Copy link

Test Results

  4 files    4 suites   1h 0m 37s ⏱️
 58 tests  33 ✅  25 💤 0 ❌
232 runs  132 ✅ 100 💤 0 ❌

Results for commit 267936b.

@kozlov721 kozlov721 changed the title (Feature) Support for multiple gpu testing Multi-GPU Training Support Jun 21, 2024
@kozlov721 kozlov721 added the enhancement New feature or request label Jun 21, 2024
@tersekmatija
Copy link
Collaborator

@kozlov721 Do we require anywhere downstream that backbone outputs exactly 4 feature maps?

@kozlov721
Copy link
Collaborator

@kozlov721 Do we require anywhere downstream that backbone outputs exactly 4 feature maps?

No, I don't think so

@tersekmatija
Copy link
Collaborator

How would neck accept this then, the last 3 feature maps for example? @kozlov721

@kozlov721
Copy link
Collaborator

How would neck accept this then, the last 3 feature maps for example? @kozlov721

The RepPANNeck needs at least as many feature maps as the value of num_heads parameter (3 by default). Then it uses the feature maps from the last one in the list.

@tersekmatija tersekmatija self-requested a review July 10, 2024 06:49
@tersekmatija
Copy link
Collaborator

Okay, I think we can merge this then.

@kozlov721 kozlov721 merged commit db24760 into dev Jul 11, 2024
@kozlov721 kozlov721 deleted the feature/multiple_gpu_testing branch July 11, 2024 13:43
kozlov721 pushed a commit that referenced this pull request Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants