Shape mismatch when training #48

martinarroyo · 2023-05-17T16:34:33Z

Steps taken:

Cloned main branch today.
Downloaded ECSSD and MSRA10K.
Set up repository with PyTorch 1.8.
Launched bash train_eval.sh.

Right on the first training step, the following error is raised:

 File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/aot-benchmark/tools/train.py", line 18, in main_worker
    trainer.sequential_training()
  File "./networks/managers/trainer.py", line 473, in sequential_training
    use_prev_prob=use_prev_prob)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "./networks/engines/aot_engine.py", line 52, in forward
    self.add_reference_frame(frame_step=0, obj_nums=obj_nums)
  File "./networks/engines/aot_engine.py", line 239, in add_reference_frame
    size_2d=self.enc_size_2d)
  File "./networks/models/aot.py", line 105, in LSTT_forward
    pos_emb, size_2d)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "./networks/layers/transformer.py", line 113, in forward
    size_2d=size_2d)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "./networks/layers/transformer.py", line 352, in forward
    tgt3 = self.short_term_attn(local_Q, local_K, local_V)[0]
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "./networks/layers/attention.py", line 525, in forward
    output = agg_value + agg_bias
RuntimeError: The size of tensor a (32) must match the size of tensor b (256) at non-singleton dimension 3

I did not alter any of the configuration values. I noticed that #2 had the same issue, but no solution was provided there. Any help would be much appreciated!

Thanks in advance

The text was updated successfully, but these errors were encountered:

94kiki · 2023-05-18T00:47:27Z

I also have this problem.

yoxu515 · 2023-05-19T10:55:49Z

Hello, sorry for the error. This bug exits when the "pytorch correlation" is not correctly installed, similar in issue #45. I recommend you to install it and the code also runs faster with it.

94kiki · 2023-05-21T09:09:48Z

Thank you very much!

Updates the description of PyTorch correlation to note that it is required for training (see yoxu515#48)

martinarroyo · 2023-05-23T11:21:20Z

Thanks for the clarification. I am now testing this and will report back. In the meanwhile, I updated the README to explain this situation to future users.

martinarroyo added a commit to martinarroyo/aot-benchmark that referenced this issue May 23, 2023

Update README.md

231a416

Updates the description of PyTorch correlation to note that it is required for training (see yoxu515#48)

martinarroyo mentioned this issue May 23, 2023

Update README.md Fixes #48 #51

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shape mismatch when training #48

Shape mismatch when training #48

martinarroyo commented May 17, 2023

94kiki commented May 18, 2023

yoxu515 commented May 19, 2023

94kiki commented May 21, 2023

martinarroyo commented May 23, 2023 •

edited

Loading

Shape mismatch when training #48

Shape mismatch when training #48

Comments

martinarroyo commented May 17, 2023

94kiki commented May 18, 2023

yoxu515 commented May 19, 2023

94kiki commented May 21, 2023

martinarroyo commented May 23, 2023 • edited Loading

martinarroyo commented May 23, 2023 •

edited

Loading