Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roscpp ActionClient subscription queue size should be consistent with rospy. #162

Merged

Conversation

fujitatomoya
Copy link
Contributor

related to #144

roscpp ActionClient subscription queue size should be more than 1, and also consistent with rospy.

Signed-off-by: Tomoya.Fujita [email protected]

…d also consistent with rospy.

Signed-off-by: Tomoya.Fujita <[email protected]>
@fujitatomoya
Copy link
Contributor Author

fujitatomoya commented Mar 16, 2020

i am not sure what went wrong with my fix, all i see RESULT: SUCCESS from the CI log. could anyone help me out?

@fujitatomoya
Copy link
Contributor Author

how do we kick start the CI once again?
/CI

@StefanFabian
Copy link

I assume only maintainers can trigger a rebuild.
Maybe @mjcarroll can help?

@mjcarroll
Copy link
Member

@ros-pull-request-builder retest this please

@mjcarroll
Copy link
Member

It looks like we're still getting a timeout here.

@fujitatomoya would you like to try to expand the timeout as @StefanFabian suggested?

@fujitatomoya
Copy link
Contributor Author

according to testlog, 90 seconds gives it enough buffer.

@fujitatomoya
Copy link
Contributor Author

@ros-pull-request-builder retest this please

@StefanFabian
Copy link

StefanFabian commented Mar 17, 2020

according to testlog, 90 seconds gives it enough buffer.

I don't know where you get that from since the test is preempted after 60 seconds without information about how much longer it might take.
However, the new error does indicate that time may not be the problem.
The problem is that the client is still in the ACTIVE state after 60 seconds.
The question is whether that is because of errors in the communication, i.e., the server is finished but the client doesn't know about it, or if the server is actually not finished yet.
Have you tried running the tests locally?

@fujitatomoya
Copy link
Contributor Author

@StefanFabian

Have you tried running the tests locally?

of course, i have.

root@9b04ce524e20:~/ros_ws/catkin_ws# catkin_make run_tests actionlib
...<snip>
[actionlib.rosunit-test_cpp_simple_client/just_succeed][passed]
[actionlib.rosunit-test_cpp_simple_client/just_abort][passed]
[actionlib.rosunit-test_cpp_simple_client/just_preempt][passed]
[actionlib.rosunit-test_cpp_simple_client/drop][passed]
[actionlib.rosunit-test_cpp_simple_client/exception][passed]
[actionlib.rosunit-test_cpp_simple_client/ignore_cancel_and_succeed][passed]
[actionlib.rosunit-test_cpp_simple_client/lose][passed]
[actionlib.rosunit-test_py_simple_client/test_drop][passed]
[actionlib.rosunit-test_py_simple_client/test_exception][passed]
[actionlib.rosunit-test_py_simple_client/test_ignore_cancel_and_succeed][passed]
[actionlib.rosunit-test_py_simple_client/test_just_abort][passed]
[actionlib.rosunit-test_py_simple_client/test_just_preempt][passed]
[actionlib.rosunit-test_py_simple_client/test_just_succeed][passed]
[actionlib.rosunit-test_py_simple_client/test_lose][passed]

SUMMARY
 * RESULT: SUCCESS
 * TESTS: 14
 * ERRORS: 0
 * FAILURES: 0
...<snip>

@fujitatomoya
Copy link
Contributor Author

according to the test history,
http://build.ros.org/job/Npr__actionlib__ubuntu_focal_amd64/27/testReport/junit/(root)/SimpleClientFixture/just_succeed/history/
http://build.ros.org/job/Npr_db__actionlib__debian_buster_amd64/27/testReport/(root)/SimpleClientFixture/just_succeed/history/

client gets timeout via waitForResult, and goal is still in ACTIVE state because server does not finish the goal and never sends back the result. I am not sure why server is not responding but connected to the client since this is not reproducible in my environment.

I am negative on expanding timeout for the test, so reverting that commit.

@fujitatomoya
Copy link
Contributor Author

let's see what's gonna happen at this time,

@ros-pull-request-builder retest this please

@StefanFabian
Copy link

Perhaps both the timeout of the test and the timeout of the waitForResult need to be increased?
I've had a look at the tests and in theory, the tests could be completed in ~20 seconds if there was no execution time except for waiting.
I would assume that even a python action server should not take more than a second to handle a request which would result in at most 30 seconds of runtime even on limited hardware but who knows.
Maybe you should try increasing the time limit of the test to 3 minutes and the client wait timeout to 120 seconds.
I would keep the test timeout greater than the client wait timeout because the error message that the server didn't answer in time is a lot more helpful than the message that the test was preempted.

@fujitatomoya
Copy link
Contributor Author

@StefanFabian

alright, let's give it a shot.

@fujitatomoya
Copy link
Contributor Author

as expected, does not fix the test failure. Server side still in active state, i certainly need to access the CI system to debug this problem to check what is happening on server side. any thoughts? BTW, i will revert the be7c705 cz this was just a trial.

@fujitatomoya
Copy link
Contributor Author

@ros-pull-request-builder retest this please

@fujitatomoya
Copy link
Contributor Author

still unstable, I do not know what to do. any thoughts?

@StefanFabian
Copy link

Without being able to debug, I can only make wild guesses.
You could modify the tests to add more logging and give each of the tests a unique result code so each call can be traced in the logs..
I'm seeing a lot of TCP/IP connection failures, not sure if that's relevant.

@fujitatomoya
Copy link
Contributor Author

@ros-pull-request-builder retest this please

Copy link
Member

@mjcarroll mjcarroll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the contribution and trying to figure out the flaky test issues.

@mjcarroll mjcarroll merged commit 8ac7539 into ros:noetic-devel Apr 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants