-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement motion plan cache #16
Conversation
5827839
to
6a7a3d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for great effort here! Did a first pass review with some comments.
I'm concerned we're not using the APIs from PlanningSceneStorage
class that is already implemented upstream for such caching. It looks like they serialize the MotionPlanRequest msg to check if it's cached before updating the database. Is there a reason we're not able to simply reuse the approach there?
Lastly, as part of this implementation we should also
- Update the nexus_msgs/stv/GetMotionPlan request to add a boolean to force lookup from the database (ReadOnly in the review comments) and fail the response if the trajectory does not exist in the databse.
- Update the PlanMotion BT node to add a similar input port as above which will append the value to the request.
* - Final pose (wrt. `planning_frame` (usually `base_link`)) | ||
* - Final robot joint states | ||
* | ||
* Motion plans may be looked up with some tolerance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mention the name of the ROS param to set the tolerance here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no ROS param that sets the tolerance here.
Instead it's passed in per lookup call on the motion plan server, since this class is meant to be a standalone class.
const moveit_msgs::msg::RobotTrajectory& plan, | ||
double execution_time_s, | ||
double planning_time_s, | ||
bool overwrite = true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a scenario where we do not want to overwrite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In most cases, no. However:
- If
exact_match_precision
is set very high (relative to start and goal match tolerance), then overwriting will result in the cache being pretty sparse.- Implication: Not overwriting results in the cache space being maximally dense, which maximizes the chances of a cache hit on lookup. This is at the cost of cache storage size as well as (probably) the speed of lookups.
- Also it can be used to see historically how plans have evolved over time.
I think for all those reasons I'd like to keep a parameter here to disable overwriting, especially if we plan to upstream this cache class to MoveIt!
Note: Overwriting in the context of the cache means deleting ALL "worse" plans that have an "exact" match (according to the exact_match_precision
amount, across all cache entry metadata fields), before inserting the new "better" entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this behavior should be documented, iirc put_plan
only inserts a new plan if it is better than existing plans. So overwrite
has no effect if it is not the best plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rclcpp::Node::SharedPtr node_; | ||
warehouse_ros::DatabaseConnection::Ptr db_; | ||
|
||
double exact_match_precision_ = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numerical precision.
double exact_match_precision_ = 0; | |
double exact_match_precision_ = 1e-6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I set this as a default on the ROS param, but no harm setting it on the variable as well!
(Note: I noticed that, even on sim when the robot is commanded to the same location each time, a value of 0.001 (~0.05 degrees per joint) seemed more appropriate.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::string _cache_db_plugin; | ||
std::string _cache_db_host; | ||
int _cache_db_port; | ||
double _cache_exact_match_tolerance; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of three doubles, let's define an enum class for each of these cases. Then have two params: 1) To get the string name of the enum type and match as defined above 2) The value of the tolerance. You could also consider using std::variant
here and define 3 three different structs with boolean values of tolerance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any cache will use all three of these tolerance values (exact matches for population, and start and goal for plan fetches.)
More importantly, these values should be configurable by the user.
I don't understand why we would use an enum or variant in that case?
Maybe I'm missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think enums and variants don't quite make sense here also. Perhaps there is confusion with the naming having both "exact_match" and "tolerance", the fomer suggest a bool and the latter suggest a double.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Motion plan caching | ||
std::shared_ptr<nexus::motion_planner::MotionPlanCache> _motion_plan_cache; | ||
|
||
bool _use_motion_plan_cache; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of three bools, let's define class which defines how the databse will be used for better extensibility.
enum class PlannerDatabaseMode: public unint8_t
{
// The planner will generate a plan. Find if an entry exists in the databse and overwrite if execution time is better.
Training,
// Try to lookup a cached plan and if non is available, plan live but do not update the cache.
Optional,
// Return a plan only if present in the database cache.
// ReadOnly,
};
And below we would store an optional of this type.
std::optional<PlannerDatabaseMode> = std::nullopt;
We can parse a similarly named ROS param which takes in a string with the same name as the enum variants above. If we have a matching name, we set the member to the appropriate enum; Else leave it as optional. In the service response, if this member is nullopt, do a live plan (same as now). Else if not nullopt, plan and update database accordingly.
We should also instantiate the _motion_plan_cache
only if this variable is not nullopt and other database specific parameters are defined (port, etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
d85907d , with an unset enum value instead of using a nullopt.
tf_listener_ = | ||
std::make_shared<tf2_ros::TransformListener>(*tf_buffer_); | ||
|
||
warehouse_ros::DatabaseLoader loader(node_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should use the DatabaseLoader
class as it is forcing us to store (and spin) a separate ROS node in the MotionPlanner class. Looking at it's implementation, it is a simple wrapper to retrieve ROS Params and load the right storage plugin. We're already retrieving some of the params in the MotionPlanner
class, we can also try loading the plugin within the on_configure()
and directly instantiate the db
object via pluginlib. This would save us the need to spin another node which brings the total node count to three! (we already have one for this server, one for the move_group_interface).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We unfortunately still need a node for the tf listener.
I will just use the motion planner's node for both the tf listener and the database loading.
Edit: I can't use the motion planner's node because it's a lifecycle node :( I will use the internal_node instead...
Hey Yadu, I just saw this review! Thanks for the review! The Whereas for us, we need to be able to do fuzzy lookups to match "close enough" start and goal states to support caching and lookups for the purposes of planning.
Ah, I had put it as a parameter on motion planning service's node, but I can just as easily put it in the motion plan request instead. |
3ed60c6
to
d8f3791
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is quite a lot of logic going on in maintaining the cache, the lack of tests is abit worrying. But I will leave it to @Yadunund to decide if this is a blocker.
std::string _cache_db_plugin; | ||
std::string _cache_db_host; | ||
int _cache_db_port; | ||
double _cache_exact_match_tolerance; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think enums and variants don't quite make sense here also. Perhaps there is confusion with the naming having both "exact_match" and "tolerance", the fomer suggest a bool and the latter suggest a double.
const moveit_msgs::msg::RobotTrajectory& plan, | ||
double execution_time_s, | ||
double planning_time_s, | ||
bool overwrite = true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this behavior should be documented, iirc put_plan
only inserts a new plan if it is better than existing plans. So overwrite
has no effect if it is not the best plan.
Signed-off-by: methylDragon <[email protected]>
Signed-off-by: methylDragon <[email protected]>
Signed-off-by: methylDragon <[email protected]>
Signed-off-by: methylDragon <[email protected]>
Signed-off-by: methylDragon <[email protected]>
Signed-off-by: methylDragon <[email protected]>
Signed-off-by: methylDragon <[email protected]>
Signed-off-by: methylDragon <[email protected]>
Signed-off-by: methylDragon <[email protected]>
Signed-off-by: methylDragon <[email protected]>
Signed-off-by: methylDragon <[email protected]>
Signed-off-by: methylDragon <[email protected]>
Signed-off-by: methylDragon <[email protected]>
Signed-off-by: methylDragon <[email protected]>
1c09ef5
to
ca0f123
Compare
Thanks for the review! I'll address all the comments in a separate PR (the branching structure makes it difficult to be updating it here, since changes need to be duplicated across the two PRs at the moment.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pre-approving for when the tests are merged.
* Add cartesian plan caching interfaces Signed-off-by: methylDragon <[email protected]> * Add construct_get_cartesian_plan_request Signed-off-by: methylDragon <[email protected]> * Add goal query and metadata Signed-off-by: methylDragon <[email protected]> * Add start query and metadata Signed-off-by: methylDragon <[email protected]> * Implement top level cache ops Signed-off-by: methylDragon <[email protected]> * Use motion plan cache for cartesian plans Signed-off-by: methylDragon <[email protected]> * Allow mismatched plan frames since we coerce anyway Signed-off-by: methylDragon <[email protected]> * Fix move bug Signed-off-by: methylDragon <[email protected]> * Plan cache code review fixes (sans unit tests) (#26) * Remove query appending macro Signed-off-by: methylDragon <[email protected]> * Default to warehouse_ros plugin if warehouse plugin isn't set Signed-off-by: methylDragon <[email protected]> * Return and use init result Signed-off-by: methylDragon <[email protected]> * Add todo for catching exceptions Signed-off-by: methylDragon <[email protected]> * Implement plan fetching with configurable key Signed-off-by: methylDragon <[email protected]> * Add comments for exact match tolerance Signed-off-by: methylDragon <[email protected]> * Slightly refactor put plan Signed-off-by: methylDragon <[email protected]> * Rename overwrite to delete_worse_plans Signed-off-by: methylDragon <[email protected]> * Split out motion plan cache into its own library Signed-off-by: methylDragon <[email protected]> * Sort constraints for reduced cardinality Signed-off-by: methylDragon <[email protected]> * Rename util function Signed-off-by: methylDragon <[email protected]> * Add todo for is_diff Signed-off-by: methylDragon <[email protected]> * Add unit tests for motion plan cache (#28) * Add count methods Signed-off-by: methylDragon <[email protected]> * Enable shared from this for cache class Signed-off-by: methylDragon <[email protected]> * Add unit test build rules Signed-off-by: methylDragon <[email protected]> * Add tests for motion plan cache (but not cartesian) Signed-off-by: methylDragon <[email protected]> * Fix bugs in cartesian caching Signed-off-by: methylDragon <[email protected]> * Add tests for cartesian plan cache Signed-off-by: methylDragon <[email protected]> * Exit if a test fails Signed-off-by: methylDragon <[email protected]> * Remove gtest import Signed-off-by: methylDragon <[email protected]> * Remove enable_shared_from_this Signed-off-by: methylDragon <[email protected]> * Only check for failure Signed-off-by: methylDragon <[email protected]> * Test half in-tolerance Signed-off-by: methylDragon <[email protected]> * Test different robot cache Signed-off-by: methylDragon <[email protected]> * Add force_cache_mode_execute_read_only (#29) * Add force_cache_mode_execute_read_only Signed-off-by: methylDragon <[email protected]> * Add force_cache_mode_execute_read_only input port Signed-off-by: methylDragon <[email protected]> --------- Signed-off-by: methylDragon <[email protected]> --------- Signed-off-by: methylDragon <[email protected]> --------- Signed-off-by: methylDragon <[email protected]> --------- Signed-off-by: methylDragon <[email protected]>
Signed-off-by: methylDragon <[email protected]>
b713af3
to
d757b8c
Compare
* Fix and add motion planner server tests Signed-off-by: methylDragon <[email protected]> * Add test deps Signed-off-by: methylDragon <[email protected]> * Add motion planner server test to CI Signed-off-by: methylDragon <[email protected]> --------- Signed-off-by: methylDragon <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this amazing effort @methylDragon!
WARNING
Description
This PR adds a
MotionPlanCache
class that wraps awarehouse_ros
database interface.It is then incorporated into the motion_planning_server.
Features:
RobotTrajectory
plans keyed byMotionPlanRequest
message valuesFurthermore, because our goals are stated in zero-offset TFs, the cache does a TF lookup and restates them in the database with respect to the workspace frame! (i.e. wrt. base_link.)
I could only test this on sqlite3. I couldn't set up MongoDB on my setup.
Also, while this is not 100% generalizable to all moveit plannable plans, I think this is a good candidate for upstreaming! 🎉
Why is this PR so big?
It's not trivial to key the motion plans. This is because there are many factors that go into the plan request, and the fulfilling plan. It should be visible from how I'm handling each factor on review.
I'll attach example requests, and an example generated trajectory. On inspecting those messages, it should be clear that it's not trivial to key them (but I think I managed to do it!)
Start State (MotionPlanRequest) Variations
`is_diff=true`, which requires getting the current state
Or if joint states were explicitly provided as start
Notice for both cases, the goal is stated in terms of a TF frame (which can change.)
The motion plan cache restates the goal in terms of an x, y, z and orientation offset from the planning frame (which should not change.)
Planning frame is consequently part of the key for cache lookups.
Generated Plan Example
Insight
Notice that keying them requires considering many things, like the varying number of joints, joint constraints; orientation constraints; position constraints; planning frames, etc.
I picked what I think is a good set that covers most of our cases. What I am explicitly not supporting is stated below.
Images
Cache hits and pushes!
A view of the schema
With database collections for each move group
Populated Entries
Notes
Does not support:
It also only partially handles matching in cases where constraints get jumbled up in order. (But this really shouldn't happen, I think.)
TODO