CUDA error causes the micp_localization node to die #2

Mh-Magdy · 2024-10-08T11:21:20Z

First of all thank you for this great package and the amazing work.
I'm running micp node with combining unit = cpu and backed optix and everything is OK. I changed the combining unit to gpu i got the following error:

I edited it back to cpu and i got the same error, also changed the backend to embree same error :(

could you help me please to skip this error?

Mh-Magdy · 2024-10-08T11:22:03Z

The complete configuration file for referencing:

base_frame: base_link
map_frame: map
odom_frame: odom
tf_rate: 50
micp:
combining_unit: cpu
corr_rate_max: 500
adaptive_max_dist: True #
viz_corr: True
print_corr_rate: False
disable_corr: False
trans: [0.0, 0.0, 0.0]
rot: [0.0, 0.0, 0.0] # euler angles (3) or quaternion (4)

sensors:
velodyne:
topic: mid/points
type: spherical
model:
range_min: 0.5
range_max: 90.0
phi_min: -0.261799067259
phi_inc: 0.03490658503988659
phi_N: 16
theta_min: -3.14159011841
theta_inc: 0.01431249500496489
theta_N: 440
micp:
max_dist: 2.0
adaptive_max_dist_min: 0.15
backend: optix

Mh-Magdy · 2024-10-11T08:06:30Z

I changed the CUDA-toolkit version from 12.6 to 11.8 and the issue dissappeard.

amock · 2024-10-12T06:07:15Z

Hi @Mh-Magdy,

thanks for testing. However, that's weird. Normally, it should run with any cuda version. So I would say it's still an issue. So I will reopen it as a reminder for me to check this.

Could you give me some more info about your setup that you used?

operating system
cuda version (nvcc --version)
gpu driver version (nvidia-smi)
OptiX version
ROS version
rmcl version or branch
rmagine version or branch
did it also fail when running the https://github.com/amock/rmcl_example with your GPU config file?

With this I think I could reproduce the error and hopefully fix it soon. (Or someone else)

Mh-Magdy · 2024-10-16T06:22:41Z

Hiii @amock 👋

OS: Ubuntu 20.04
CUDA: 12.6
GPU driver: nvidia-driver-560 - third-party non-free recommended
Optix: 7.7
ROS: ROS1 noetic
RMCL: noetic branch
rmagine: latest main branch on GitHub
Actually it worked just fine with the example configuration for the CPU version and embree backend, but with the gpu it give me the error i mentioned above and when i changed the cuda version and gpu driver version it worked (The example config).

There are another minor issues that faced me recently after the update, i will report them to you in more details but i will give you a hint about them now:

When i perform some edits on the sensor parameters for example changing the number of horizontal samples to match my real sensor (theta_inc, theta_N) if backed is optix the package fails to run and if i change it to embree it works fine. I will capture any issues like these and give you details on the issue and my environment/setup as well as the config to help you reproduce the errors.

Thank you Alexander

amock · 2024-11-05T12:51:34Z

Hi @Mh-Magdy,

I have finally found some time to deal with your issue. First I tried to resemble your setup:

Ubuntu 20.04.6 LTS
ROS1 noetic
GPU: RTX 2060 Super, Driver 560.35.03, CUDA V12.6.77
OptiX 7.7
rmagine: (main branch) 2.2.7
rmcl: noetic branch
rmcl_msgs: noetic branch
rmcl_example: noetic branch

In the first terminal I started the example simulation by executing:

roslaunch rmcl_example start_robot.launch

then I changed the rmcl_example/launch/rmcl_micp.launch to load the parameters from rmcl_example/config/micp_gpu.yaml.
This config uses the GPU for both computing the correspondences and combining the covariances.
So I assumed this setup would cause the error you described.
Then I executed:

roslaunch rmcl_example rmcl_micp.launch

In the RViz window I set an initial pose guess and everything went fine. So unfortunately, I could not reproduce the error you described. Could you maybe try the exact same procedure on your system? Otherwise I am not sure what is wrong on your system :/ Perhaps you could also check if rmagine alone is working. There are some benchmark executables in it. Or perhaps you could check if CUDA is working for other projects.

Best
Alex

Mh-Magdy closed this as completed Oct 11, 2024

amock reopened this Oct 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA error causes the micp_localization node to die #2

CUDA error causes the micp_localization node to die #2

Mh-Magdy commented Oct 8, 2024

Mh-Magdy commented Oct 8, 2024 •

edited

Loading

Mh-Magdy commented Oct 11, 2024

amock commented Oct 12, 2024

Mh-Magdy commented Oct 16, 2024

amock commented Nov 5, 2024

CUDA error causes the micp_localization node to die #2

CUDA error causes the micp_localization node to die #2

Comments

Mh-Magdy commented Oct 8, 2024

Mh-Magdy commented Oct 8, 2024 • edited Loading

Mh-Magdy commented Oct 11, 2024

amock commented Oct 12, 2024

Mh-Magdy commented Oct 16, 2024

amock commented Nov 5, 2024

Mh-Magdy commented Oct 8, 2024 •

edited

Loading