-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: define sideband optimization hints #705
feat: define sideband optimization hints #705
Conversation
proto/substrait/algebra.proto
Outdated
// The local system can use any numbering system it wants but for better compatibility | ||
// it is suggested to refer to computations in order of the input that they are derived | ||
// from. Computation numbers start at 1. | ||
int32 number = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I struggle with the use of this. Anchor seems sufficient to have referencing between locations and if we're trying to identify engine meaningful, I think better to just allow a hint any value on the computation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This defines a sharing group. Because there is a potentially one to many relationship for every computation we need some way of defining which group a particular reference is from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it.
I think you're saying that a single rel could save, let's say, two hashtables. And five operators could consume all/some/none of those two hash tables. I think anchor is more than enough for that. Why do we also need number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you know if the first hashtable or the second hashtable was supposed to be output for that computation group? The same goes for the consuming side -- how do you know if the that computation is destined for the first or the second hashtable slot? Anchor (or computation group id) tells you which relations are consuming it and not where in the relation it is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main thought is the "how the number maps to an internal structure" is opaque to consumers, whether it is one number or two. Whether we have anchor1 and anchor2 as the left and right or anchor1,number1 and anchor1,number2 doesn't impact that the mapping is hidden/implicit.
For now I suggest that we just eliminate the second number.
It would be helpful to have example plans for this feature. In the abstract I understand the intent of what we're trying to accomplish, but I think it would make a lot more sense in context. |
55cba8b
to
e290d32
Compare
I've added some documentation that I hope addresses this concern. |
proto/substrait/algebra.proto
Outdated
// The local system can use any numbering system it wants but for better compatibility | ||
// it is suggested to refer to computations in order of the input that they are derived | ||
// from. Computation numbers start at 1. | ||
int32 number = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main thought is the "how the number maps to an internal structure" is opaque to consumers, whether it is one number or two. Whether we have anchor1 and anchor2 as the left and right or anchor1,number1 and anchor1,number2 doesn't impact that the mapping is hidden/implicit.
For now I suggest that we just eliminate the second number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @EpsilonPrime !
Since this isn't breaking, I think we can merge with two SMC+1, right? So @EpsilonPrime and myself should be enough. Right? |
@EpsilonPrime , pinging on my question here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved in the sense of "let's get something out there and see how it works." We can probably refine slightly as we go.
I suppose that is technically true since we didn't explicitly mark |
No description provided.