Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lookup transform predictably misses the translation part in multi-node setup #566

Open
roshambo919 opened this issue Aug 19, 2024 · 1 comment

Comments

@roshambo919
Copy link

roshambo919 commented Aug 19, 2024

This unexpected tf2 lookup behavior has been stumping me for days on ros2 humble.

I have the following frames:

world
agent0
agent0/lidar0
agent1
agent1/lidar0
... # etc up to agent 2

In this particular example, agent0 is a mobile agent, so the frame agent0 to world is published dynamically at a high rate. All other frames (e.g., agent0/lidar0 to agent0, agent1 to world, agent1/lidar0 to agent1, ...) are static and published on tf_static from a rosbag once at the beginning.

I have several nodes that get the transformations in python via buffer.lookup_transform(). One of the nodes ("node1") looks up all of the agenti/lidar0 to world frames and never has any problems. An example logging output would show, e.g.,

agent0 : geometry_msgs.msg.TransformStamped(
     header=std_msgs.msg.Header(
          stamp=builtin_interfaces.msg.Time(sec=0, nanosec=940000000), frame_id='world'),
          child_frame_id='agent0/lidar0',
     transform=geometry_msgs.msg.Transform(
          translation=geometry_msgs.msg.Vector3(x=-87.25453507565153, y=-27.90808621333283, z=1.5932031571520584), 
          rotation=geometry_msgs.msg.Quaternion(x=-2.399310128784853e-05, y=-1.7728866372079825e-05, z=-0.0013892804829969047, w=0.9999990345044129))
)

agent1 : geometry_msgs.msg.TransformStamped(
     header=std_msgs.msg.Header(
          stamp=builtin_interfaces.msg.Time(sec=0, nanosec=940000000), frame_id='world'),
          child_frame_id='agent1/lidar0', 
     transform=geometry_msgs.msg.Transform(
          translation=geometry_msgs.msg.Vector3(x=-67.25457001, y=-27.96375847, z=20.60000038), 
          rotation=geometry_msgs.msg.Quaternion(x=0.00035956799826580756, y=0.25881879533495106, z=-0.0013419260383038315, w=0.9659248941438082))
)
...

Another one of the nodes ("node2") performs what I believe to be the exact same lookup process. Unfortunately, the output is incorrect. While sometimes the very first lookup call produces the expected output, every subsequent lookup (and sometimes even the first) completely neglects the translation component of the static transforms. Thus, an output from node2's lookups would be:

agent0 : geometry_msgs.msg.TransformStamped(
     header=std_msgs.msg.Header(
          stamp=builtin_interfaces.msg.Time(sec=0, nanosec=940000000), frame_id='world'),
          child_frame_id='agent0/lidar0',
     transform=geometry_msgs.msg.Transform(
          translation=geometry_msgs.msg.Vector3(x=-87.25453507565153, y=-27.90808621333283, z=1.5932031571520584), 
          rotation=geometry_msgs.msg.Quaternion(x=-2.399310128784853e-05, y=-1.7728866372079825e-05, z=-0.0013892804829969047, w=0.9999990345044129))
)

agent1:geometry_msgs.msg.TransformStamped(
     header=std_msgs.msg.Header(
          stamp=builtin_interfaces.msg.Time(sec=0, nanosec=940000000), frame_id='world'),
          child_frame_id='agent1/lidar0', 
     transform=geometry_msgs.msg.Transform(
          translation=geometry_msgs.msg.Vector3(x=0.0, y=0.0, z=0.0),  # THIS IS THE PROBLEM!!!!!
          rotation=geometry_msgs.msg.Quaternion(x=0.00035956799826580756, y=0.25881879533495106, z=-0.0013419260383038315, w=0.9659248941438082))
)

where everything is the same except that the translation has been neglected to be populated. The quaternion is correct. Agent0 (being dynamic and on /tf) is correct, HOWEVER, I will note that in node2 I sometimes get an extrapolation error for agent0 that does not appear in node1.

In both cases of node1 and node2, I am using MessageFilters to synchronize multiple topics for a callback. It just so happens that node1 is working while node2 is not. It should be known that the output of node1 is one of the synchronized inputs for node2 (i.e. node2 depends on node 1).

I'm at a loss for how to debug this. Has anyone come across this? @tfoote. My code base is large, but if needed I could spend some time trying to recreate in an MWE.

Addendum:

  • both ros2 run tf2_ros tf2_echo world agent1/lidar0 and ros2 topic echo /tf_static produce the anticipated output
@tfoote
Copy link
Member

tfoote commented Aug 19, 2024

Unfortunately, the output is incorrect. While sometimes the very first lookup call produces the expected output, every subsequent lookup (and sometimes even the first) completely neglects the translation component of the static transforms.

Inconsistent behaviors like this suggest that there's a different data source, possibly conflicting or a race conditoin? Is there a conflicting static transform in the bag file for world to agent0 perhaps?

All other frames (e.g., agent0/lidar0 to agent0, agent1 to world, agent1/lidar0 to agent1, ...) are static and published on tf_static from a rosbag once at the beginning.

Using a dedicated static_transform_publisher will be more reliable: https://docs.ros.org/en/rolling/Releases/Release-Humble-Hawksbill.html#more-intuitive-command-line-arguments-for-static-transform-publisher https://docs.ros.org/en/rolling/p/tf2_ros/cli_tools.html

If you're cascadng responses it's doing the same lookup at a later time, which further reinforces the likelyhood that there's a race condition with potentially conflicting data sources being interleaved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants