Nan #26

cymdhx · 2021-04-01T11:31:16Z

when I use I meet

d-li14 · 2021-04-02T10:22:54Z

please specify the experimental details

cymdhx · 2021-04-02T12:22:51Z

请具体说明实验的细节。

我是将他加在了yolo网络上，将panet层的conv改成了involution，使用conv的时候不会nan，但是改成involution时出现了nan

cymdhx · 2021-04-02T12:26:35Z

请具体说明实验的细节。

我是将他加在了yolo网络上，将panet层的conv改成了involution，使用conv的时候不会nan，但是改成involution时出现了nan

like this

cymdhx · 2021-04-02T12:28:14Z

请具体说明实验的细节。

我是将他加在了yolo网络上，将panet层的conv改成了involution，使用conv的时候不会nan，但是改成involution时出现了nan

大佬，有啥办法可以解决吗，我试过将loss调低但是没什么用

d-li14 · 2021-04-02T13:56:32Z

You may try the gradient clipping method, which is also used sometimes when we train our detection models, for example, https://github.com/d-li14/involution/blob/main/det/configs/involution/retinanet_red50_neck_fpn_1x_coco.py#L8

cymdhx · 2021-04-05T06:09:47Z

你可以试试梯度裁剪方法，有时在我们训练检测模型时也会用到，例如，Https://github.com/d-li14/involution/blob/main/det/configs/involution/retinanet_red50_neck_fpn_1x_coco.py#L8

thank you so much

cymdhx · 2021-04-05T06:35:07Z

你可以试试梯度裁剪方法，有时在我们训练检测模型时也会用到，例如，Https://github.com/d-li14/involution/blob/main/det/configs/involution/retinanet_red50_neck_fpn_1x_coco.py#L8

thank you so much

当我用了梯度裁剪好像还是会nan

songyonger · 2021-04-06T09:48:22Z

I replaced the conv in the resblock in the super resolution model "edsr" with involution, and i used the gradient method, but the loss is still inf.

cymdhx · 2021-04-07T02:49:07Z

我用对合代替了超分辨率模型“edsr”中的conv，使用了梯度法，但损失仍然是inf。

你现在解决了吗

songyonger · 2021-04-07T03:27:19Z

还没有

cymdhx · 2021-04-07T03:38:03Z

还没有

我也没有，可以讨论讨论

545088212 · 2021-04-14T01:47:21Z

还没有

我也没有，可以讨论讨论

请问一下你们现在解决了吗？

NNPanNPU · 2021-04-14T11:23:41Z

The loss of mine in the training set is fine, while in cv set, some batches are nan.
It's definitely not gradient explosion. I don't know how to find the problem and debug.

songwaimai · 2021-04-15T08:50:05Z

The loss of mine in the training set is fine, while in cv set, some batches are nan.
It's definitely not gradient explosion. I don't know how to find the problem and debug.

Maybe your dataset is not pure?

songwaimai · 2021-04-15T08:51:45Z

I also met this problem in generation task. I replaced the con 3x3 by involution, the loss in nan or inf.

cymdhx · 2021-04-15T11:04:47Z

我在代任务中也遇到了这个问题。我将con3x3替换为对合，在NaN或INF中的损失。

我也没解决，所以我已经快要放弃使用involution了

songwaimai · 2021-04-15T15:46:54Z

我在代任务中也遇到了这个问题。我将con3x3替换为对合，在NaN或INF中的损失。

我也没解决，所以我已经快要放弃使用involution了
I also tried the gradient clipping method, but the NAN problem is not be solved, i will try to find some else methods which may work out.

cymdhx · 2021-04-15T15:54:03Z

我在代任务中也遇到了这个问题。我将con3x3替换为对合，在NaN或INF中的损失。

我也没解决，所以我已经快要放弃使用involution了
I also tried the gradient clipping method, but the NAN problem is not be solved, i will try to find some else methods which may work out.

I also tried the gradient clipping method too.But It didn't work.If you have any good methods, please share them, thank you

lygsbw · 2021-04-15T16:17:23Z

#26 (comment)

I also met the same problem when dealing with the pose estimation task.

songwaimai · 2021-04-16T05:39:11Z

我在代任务中也遇到了这个问题。我将con3x3替换为对合，在NaN或INF中的损失。

我也没解决，所以我已经快要放弃使用involution了
I also tried the gradient clipping method, but the NAN problem is not be solved, i will try to find some else methods which may work out.

I also tried the gradient clipping method too.But It didn't work.If you have any good methods, please share them, thank you

ok

LJill · 2021-04-27T01:13:35Z

我在使用involution替换RCAN中的CA模块时，loss也非常大

songyonger · 2021-04-27T01:26:11Z

I replace the standard conv with involution and added bn, then the loss seems normal.But the final result is worse than edsr baseline with bn layer,even though i added the parameters of the edsr-involution.Now i have given up. You can have a try and we can talk.@LJill

LJill · 2021-04-27T02:02:58Z

I replace the standard conv with involution and added bn, then the loss seems normal.But the final result is worse than edsr baseline with bn layer,even though i added the parameters of the edsr-involution.Now i have given up. You can have a try and we can talk.@LJill

Thanks for your reply . I tried your method on EDSR and RCAN , it works , the loss is normal now . I will conduct experiments to observe the final result .

songwaimai · 2021-04-29T15:45:59Z

I replace the standard conv with involution and added bn, then the loss seems normal.But the final result is worse than edsr baseline with bn layer,even though i added the parameters of the edsr-involution.Now i have given up. You can have a try and we can talk.@LJill

Thanks for your reply . I tried your method on EDSR and RCAN , it works , the loss is normal now . I will conduct experiments to observe the final result .
when i replace the conv with involution and add BN, the train loss seems normal, but the val loss is NAN still, has this happened to your model?

whf9527 · 2021-05-05T15:03:03Z

我换成involution，结果参数好像不能进行优化。train loss一直下降，但是val loss一直保持一个值没变。有大佬知道这是不是过拟合造成的，还是代码错误。
我感觉不是过拟，因为train loss下降，val loss基本没变。还没有解决这个问题

whf9527 · 2021-05-06T04:49:24Z

The loss of mine in the training set is fine, while in cv set, some batches are nan.
It's definitely not gradient explosion. I don't know how to find the problem and debug.

what cause this problem ?? I also met this issue. train loss is better, but the val loss is unchange.

ChristophReich1996 · 2021-05-14T09:33:23Z

I implemented a pure PyTorch 2D involution and faced a similar issue of Nans occurring during training when using the involution as a plug-in replacement for convolutions. In my case this was caused by exploding activation. For me, the issue could be solved by utilizing a higher momentum (0.3) in the batch normalization (after reduction) layer. I guess the distribution of the activation change that much that batch norm, with track_running_stats=True and momentum=0.1, can not follow the changing distribution, resulting in exploding activations. This was my conclusion after looking at the PyTorch batch norm implementation, which uses also the running stats for normalization during training (correct me if I'm wrong).

weiguangzhao · 2021-05-24T08:42:42Z

when I use I meet

@cymdhx @songwaimai @whf9527

我解决了我遇到的nan问题，附上我的解决方法，不知道是否适用于你们的：
问题描述： Unet + resnet 改为 unet + rednet50时出现 nan，inf
解决方案：把程序中的以下代码去掉，不要人为初始化 weight and bias

def set_bn_init(m):
classname = m.__class__.__name__
if classname.find('BatchNorm') != -1:
m.weight.data.fill_(1.0)
m.bias.data.fill_(0.0)

I solved the nan problem I encountered, and attached my solution, I don’t know if it applies to yours:
Problem description: When Unet + resnet is changed to unet + rednet50, nan and inf appear
Solution: Remove the following code in the program, do not initialize weight and bias artificially

def set_bn_init(m):
classname = m.__class__.__name__
if classname.find('BatchNorm') != -1:
m.weight.data.fill_(1.0)
m.bias.data.fill_(0.0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nan #26

Nan #26

cymdhx commented Apr 1, 2021

d-li14 commented Apr 2, 2021

cymdhx commented Apr 2, 2021

cymdhx commented Apr 2, 2021

cymdhx commented Apr 2, 2021

d-li14 commented Apr 2, 2021

cymdhx commented Apr 5, 2021

cymdhx commented Apr 5, 2021

songyonger commented Apr 6, 2021

cymdhx commented Apr 7, 2021

songyonger commented Apr 7, 2021

cymdhx commented Apr 7, 2021

545088212 commented Apr 14, 2021

NNPanNPU commented Apr 14, 2021

songwaimai commented Apr 15, 2021

songwaimai commented Apr 15, 2021

cymdhx commented Apr 15, 2021

songwaimai commented Apr 15, 2021

cymdhx commented Apr 15, 2021

lygsbw commented Apr 15, 2021

songwaimai commented Apr 16, 2021

LJill commented Apr 27, 2021

songyonger commented Apr 27, 2021

LJill commented Apr 27, 2021

songwaimai commented Apr 29, 2021

whf9527 commented May 5, 2021

whf9527 commented May 6, 2021

ChristophReich1996 commented May 14, 2021

weiguangzhao commented May 24, 2021 •

edited

Loading

Nan #26

Nan #26

Comments

cymdhx commented Apr 1, 2021

d-li14 commented Apr 2, 2021

cymdhx commented Apr 2, 2021

cymdhx commented Apr 2, 2021

cymdhx commented Apr 2, 2021

d-li14 commented Apr 2, 2021

cymdhx commented Apr 5, 2021

cymdhx commented Apr 5, 2021

songyonger commented Apr 6, 2021

cymdhx commented Apr 7, 2021

songyonger commented Apr 7, 2021

cymdhx commented Apr 7, 2021

545088212 commented Apr 14, 2021

NNPanNPU commented Apr 14, 2021

songwaimai commented Apr 15, 2021

songwaimai commented Apr 15, 2021

cymdhx commented Apr 15, 2021

songwaimai commented Apr 15, 2021

cymdhx commented Apr 15, 2021

lygsbw commented Apr 15, 2021

songwaimai commented Apr 16, 2021

LJill commented Apr 27, 2021

songyonger commented Apr 27, 2021

LJill commented Apr 27, 2021

songwaimai commented Apr 29, 2021

whf9527 commented May 5, 2021

whf9527 commented May 6, 2021

ChristophReich1996 commented May 14, 2021

weiguangzhao commented May 24, 2021 • edited Loading

weiguangzhao commented May 24, 2021 •

edited

Loading