-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: Expected more than 1 value per channel when training, got input size 1 #141
Comments
When you don't use sync BN, you should make sure the batch > 1 per gpu. |
thanks! but, I used sync BN, and the same error is still reported when train_batch_size is set to 1: Error when train_batch_size is set to 2: |
My configuration:pytorch 1.4.0 and py3.7_cuda10.1.243_cudnn7.6.3_0,Two RTX2080Ti-11G.After modifying NGPUS and --gpu in run_fs_pspnet_cityscapes_seg.sh, the error is reported:
2020-04-11 12:23:19,529 INFO [runner_helper.py, 38] Converting syncbn model...
2020-04-11 12:23:22,108 INFO [controller.py, 28] Training start...
Traceback (most recent call last):
File "main.py", line 185, in
Controller.train(runner)
File "/home/zl/zhaoliu/fam/lib/runner/controller.py", line 46, in train
runner.train()
File "/home/zl/zhaoliu/fam/runner/seg/fcn_segmentor.py", line 86, in train
out = self.seg_net(data_dict)
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
Traceback (most recent call last):
File "main.py", line 185, in
result = self.forward(*input, **kwargs)
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 447, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
Controller.train(runner)
File "/home/zl/zhaoliu/fam/lib/runner/controller.py", line 46, in train
result = self.forward(*input, **kwargs)
File "/home/zl/zhaoliu/fam/model/seg/nets/pspnet.py", line 95, in forward
x = self.ppm(x)
runner.train() File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
File "/home/zl/zhaoliu/fam/runner/seg/fcn_segmentor.py", line 86, in train
out = self.seg_net(data_dict)
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zl/zhaoliu/fam/model/seg/nets/pspnet.py", line 52, in forward
ppm_out.append(F.interpolate(pool_scale(x), (input_size[2], input_size[3]),
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 447, in forward
result = self.forward(*input, **kwargs)
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
input = module(input)
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zl/zhaoliu/fam/model/seg/nets/pspnet.py", line 95, in forward
result = self.forward(*input, **kwargs)
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
x = self.ppm(x)
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
input = module(input)
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zl/zhaoliu/fam/model/seg/nets/pspnet.py", line 52, in forward
result = self.forward(*input, **kwargs)
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 473, in forward
ppm_out.append(F.interpolate(pool_scale(x), (input_size[2], input_size[3]),
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
self.eps, exponential_average_factor, process_group, world_size)
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/nn/modules/_functions.py", line 13, in forward
result = self.forward(*input, **kwargs)
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size 1
input = module(input)
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 473, in forward
self.eps, exponential_average_factor, process_group, world_size)
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/nn/modules/_functions.py", line 13, in forward
raise ValueError('Expected more than 1 value per channel when training, got input size {}'.format(size))
ValueError: Expected more than 1 value per channel when training, got input size 1
Traceback (most recent call last):
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in
main()
File "/home/zl/.conda/envs/yolact-env/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/zl/.conda/envs/yolact-env/bin/python', '-u', 'main.py', '--local_rank=1', '--config_file', 'configs/seg/cityscapes/base_fcn_cityscapes_seg.conf', '--phase', 'train', '--gpu', '0', '1', '--train_batch_size', '1', '--val_batch_size', '1', '--backbone', 'deepbase_resnet101_d8', '--model_name', 'pspnet', '--drop_last', 'y', '--syncbn', 'y', '--dist', 'y', '--data_dir', '/home/zl/zhaoliu/fam/DataSet/CityScapes', '--loss_type', 'dsnce_loss', '--max_iters', '40000', '--checkpoints_name', 'fs_pspnet_cityscapes_segtag', '--pretrained', './pretrained_models/3x3resnet101-imagenet.pth']' returned non-zero exit status 1.
The text was updated successfully, but these errors were encountered: