You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm investigating vision transformer models' performance on visual odometry. As a start, I am using your implementation of the SimpleViT. As I am quite new to the field I don't really understand everything yet and I am getting some weird results.
I only get lines as my output and I can't figure out why... Is it because the last layer of the transformer is a linear layer? I have 6 outputs (x, y, z, yaw, roll, pitch) and I am using a custom loss function similar to that of DeepVO.
As I said this is all very experimental so I don't mind having bad results, I just want to understand why I can't even get results.
Thanks in advance!
The text was updated successfully, but these errors were encountered:
I'm investigating vision transformer models' performance on visual odometry. As a start, I am using your implementation of the SimpleViT. As I am quite new to the field I don't really understand everything yet and I am getting some weird results.
I only get lines as my output and I can't figure out why... Is it because the last layer of the transformer is a linear layer? I have 6 outputs (x, y, z, yaw, roll, pitch) and I am using a custom loss function similar to that of DeepVO.
As I said this is all very experimental so I don't mind having bad results, I just want to understand why I can't even get results.
Thanks in advance!
The text was updated successfully, but these errors were encountered: