Lookup transform predictably misses the translation part in multi-node setup #566

roshambo919 · 2024-08-19T19:42:41Z

This unexpected tf2 lookup behavior has been stumping me for days on ros2 humble.

I have the following frames:

world
agent0
agent0/lidar0
agent1
agent1/lidar0
... # etc up to agent 2

In this particular example, agent0 is a mobile agent, so the frame agent0 to world is published dynamically at a high rate. All other frames (e.g., agent0/lidar0 to agent0, agent1 to world, agent1/lidar0 to agent1, ...) are static and published on tf_static from a rosbag once at the beginning.

I have several nodes that get the transformations in python via buffer.lookup_transform(). One of the nodes ("node1") looks up all of the agenti/lidar0 to world frames and never has any problems. An example logging output would show, e.g.,

agent0 : geometry_msgs.msg.TransformStamped(
     header=std_msgs.msg.Header(
          stamp=builtin_interfaces.msg.Time(sec=0, nanosec=940000000), frame_id='world'),
          child_frame_id='agent0/lidar0',
     transform=geometry_msgs.msg.Transform(
          translation=geometry_msgs.msg.Vector3(x=-87.25453507565153, y=-27.90808621333283, z=1.5932031571520584), 
          rotation=geometry_msgs.msg.Quaternion(x=-2.399310128784853e-05, y=-1.7728866372079825e-05, z=-0.0013892804829969047, w=0.9999990345044129))
)

agent1 : geometry_msgs.msg.TransformStamped(
     header=std_msgs.msg.Header(
          stamp=builtin_interfaces.msg.Time(sec=0, nanosec=940000000), frame_id='world'),
          child_frame_id='agent1/lidar0', 
     transform=geometry_msgs.msg.Transform(
          translation=geometry_msgs.msg.Vector3(x=-67.25457001, y=-27.96375847, z=20.60000038), 
          rotation=geometry_msgs.msg.Quaternion(x=0.00035956799826580756, y=0.25881879533495106, z=-0.0013419260383038315, w=0.9659248941438082))
)
...

Another one of the nodes ("node2") performs what I believe to be the exact same lookup process. Unfortunately, the output is incorrect. While sometimes the very first lookup call produces the expected output, every subsequent lookup (and sometimes even the first) completely neglects the translation component of the static transforms. Thus, an output from node2's lookups would be:

agent0 : geometry_msgs.msg.TransformStamped(
     header=std_msgs.msg.Header(
          stamp=builtin_interfaces.msg.Time(sec=0, nanosec=940000000), frame_id='world'),
          child_frame_id='agent0/lidar0',
     transform=geometry_msgs.msg.Transform(
          translation=geometry_msgs.msg.Vector3(x=-87.25453507565153, y=-27.90808621333283, z=1.5932031571520584), 
          rotation=geometry_msgs.msg.Quaternion(x=-2.399310128784853e-05, y=-1.7728866372079825e-05, z=-0.0013892804829969047, w=0.9999990345044129))
)

agent1:geometry_msgs.msg.TransformStamped(
     header=std_msgs.msg.Header(
          stamp=builtin_interfaces.msg.Time(sec=0, nanosec=940000000), frame_id='world'),
          child_frame_id='agent1/lidar0', 
     transform=geometry_msgs.msg.Transform(
          translation=geometry_msgs.msg.Vector3(x=0.0, y=0.0, z=0.0),  # THIS IS THE PROBLEM!!!!!
          rotation=geometry_msgs.msg.Quaternion(x=0.00035956799826580756, y=0.25881879533495106, z=-0.0013419260383038315, w=0.9659248941438082))
)

where everything is the same except that the translation has been neglected to be populated. The quaternion is correct. Agent0 (being dynamic and on /tf) is correct, HOWEVER, I will note that in node2 I sometimes get an extrapolation error for agent0 that does not appear in node1.

In both cases of node1 and node2, I am using MessageFilters to synchronize multiple topics for a callback. It just so happens that node1 is working while node2 is not. It should be known that the output of node1 is one of the synchronized inputs for node2 (i.e. node2 depends on node 1).

I'm at a loss for how to debug this. Has anyone come across this? @tfoote. My code base is large, but if needed I could spend some time trying to recreate in an MWE.

Addendum:

both ros2 run tf2_ros tf2_echo world agent1/lidar0 and ros2 topic echo /tf_static produce the anticipated output

The text was updated successfully, but these errors were encountered:

tfoote · 2024-08-19T20:02:32Z

Unfortunately, the output is incorrect. While sometimes the very first lookup call produces the expected output, every subsequent lookup (and sometimes even the first) completely neglects the translation component of the static transforms.

Inconsistent behaviors like this suggest that there's a different data source, possibly conflicting or a race conditoin? Is there a conflicting static transform in the bag file for world to agent0 perhaps?

All other frames (e.g., agent0/lidar0 to agent0, agent1 to world, agent1/lidar0 to agent1, ...) are static and published on tf_static from a rosbag once at the beginning.

Using a dedicated static_transform_publisher will be more reliable: https://docs.ros.org/en/rolling/Releases/Release-Humble-Hawksbill.html#more-intuitive-command-line-arguments-for-static-transform-publisher https://docs.ros.org/en/rolling/p/tf2_ros/cli_tools.html

If you're cascadng responses it's doing the same lookup at a later time, which further reinforces the likelyhood that there's a race condition with potentially conflicting data sources being interleaved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lookup transform predictably misses the translation part in multi-node setup #566

Lookup transform predictably misses the translation part in multi-node setup #566

roshambo919 commented Aug 19, 2024 •

edited

Loading

tfoote commented Aug 19, 2024

Lookup transform predictably misses the translation part in multi-node setup #566

Lookup transform predictably misses the translation part in multi-node setup #566

Comments

roshambo919 commented Aug 19, 2024 • edited Loading

tfoote commented Aug 19, 2024

roshambo919 commented Aug 19, 2024 •

edited

Loading