Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shape mismatch when training #48

Open
martinarroyo opened this issue May 17, 2023 · 4 comments
Open

Shape mismatch when training #48

martinarroyo opened this issue May 17, 2023 · 4 comments

Comments

@martinarroyo
Copy link

Steps taken:

  • Cloned main branch today.
  • Downloaded ECSSD and MSRA10K.
  • Set up repository with PyTorch 1.8.
  • Launched bash train_eval.sh.

Right on the first training step, the following error is raised:

 File "/opt/conda/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/aot-benchmark/tools/train.py", line 18, in main_worker
    trainer.sequential_training()
  File "./networks/managers/trainer.py", line 473, in sequential_training
    use_prev_prob=use_prev_prob)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "./networks/engines/aot_engine.py", line 52, in forward
    self.add_reference_frame(frame_step=0, obj_nums=obj_nums)
  File "./networks/engines/aot_engine.py", line 239, in add_reference_frame
    size_2d=self.enc_size_2d)
  File "./networks/models/aot.py", line 105, in LSTT_forward
    pos_emb, size_2d)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "./networks/layers/transformer.py", line 113, in forward
    size_2d=size_2d)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "./networks/layers/transformer.py", line 352, in forward
    tgt3 = self.short_term_attn(local_Q, local_K, local_V)[0]
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "./networks/layers/attention.py", line 525, in forward
    output = agg_value + agg_bias
RuntimeError: The size of tensor a (32) must match the size of tensor b (256) at non-singleton dimension 3

I did not alter any of the configuration values. I noticed that #2 had the same issue, but no solution was provided there. Any help would be much appreciated!

Thanks in advance

@94kiki
Copy link

94kiki commented May 18, 2023

I also have this problem.

@yoxu515
Copy link
Owner

yoxu515 commented May 19, 2023

Hello, sorry for the error. This bug exits when the "pytorch correlation" is not correctly installed, similar in issue #45. I recommend you to install it and the code also runs faster with it.

@94kiki
Copy link

94kiki commented May 21, 2023

Thank you very much!

martinarroyo added a commit to martinarroyo/aot-benchmark that referenced this issue May 23, 2023
Updates the description of PyTorch correlation to note that it is required for training (see yoxu515#48)
@martinarroyo
Copy link
Author

martinarroyo commented May 23, 2023

Thanks for the clarification. I am now testing this and will report back. In the meanwhile, I updated the README to explain this situation to future users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants