-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support negative preceding/following for ROW window functions #14093
Support negative preceding/following for ROW window functions #14093
Conversation
This commit adds support for "offset" ROW windows, where the preceding and following window bounds are allowed to have negative values. This allows window definitions to exclude the current row entirely. Prior to this change, ROW-based windows *had* to include the current row, causing `preceding` and `following` to support only non-negative values. Additionally, the inclusion of the current row would count against the `min_periods` check. The following is an example of the new "negative" semantics. Consider the input: ```c++ auto const row = ints_column{1, 2, 3, 4}; ``` If the window bounds are specified as (preceding=3, following=-1), then the window for the third row (`3`) is `{1, 2}`. `following=-1` indicates a "following" row *before* the current row. A negative value for `preceding` follows the existing convention of including the current row. This makes it slightly more involved: 1. `preceding=2` indicates *one* row before the current row. 2. `preceding=1` indicates the current row. 3. `preceding=0` indicates one row past (i.e. after) the current row. 4. `preceding=-1` indicates two rows after the current row. Et cetera. `min_periods` checks continue to be honoured as before, but the requirement for positive `min_periods` is dropped. `min_periods` only need be non-negative. Signed-off-by: MithunR <[email protected]>
Also, removed prints.
/ok to test |
Update: This PR is ready for review. |
A note to the reviewer:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update https://github.com/rapidsai/cudf/blob/branch-23.10/cpp/include/cudf/rolling.hpp#L181 to document the relaxation of preceding_window and following_window
@@ -94,6 +93,109 @@ std::unique_ptr<column> grouped_rolling_window(table_view const& group_keys, | |||
|
|||
namespace detail { | |||
|
|||
/// Preceding window calculation functor. | |||
template <bool preceding_less_than_1> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the advantage and disadvantage of having this template?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. This is to enable the if constexpr()
in operator()
:
if constexpr (preceding_less_than_1) { // where 1 indicates only the current row.
auto group_end = _group_offsets_begin[group_label + 1];
return thrust::maximum{}(_preceding_window, -(group_end - 1 - idx));
} else { ... }
When this used to be a lambda, this used to be:
if (preceding < 1) { ... } else { ... }
That if check would have run once per row in the window. Or worst case N**2 times, over the column. The template was so that this is checked once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has there been a tangible benefit to runtime perf with this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if the refactor isn't immediately obvious. This commit shows the refactor: 589f2a5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I don't have a benchmark in place for this.
I did assume that lifting this if check out would be beneficial.
(Sorry for the delayed response. GitHub reply notifications seem to be delayed. )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you rather I postponed this refactor until I have confirmed the advantage in a benchmark?
I'd be happy to do that if it helps get the main feature in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's totally fine to get this in as-is as long as there's no significant change to compile times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for accommodating, @divyegala.
I've measured how much longer grouped_rolling.cu
takes to compile, with all the changes in this PR thrown in.
Old: 98s
New: 93s
It's taking less time to compile now. This might have to do with my materializing the preceding/following columns, to simplify the iterators passed to cudf::detail::rolling_window
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside, the compile time for rolling_fixed_window.cu
has reduced from 80s to 65s.
So @divyegala's instinct was right, to keep an eye on the compile times. I think I've mitigated it for now.
Ah, I haven't updated the header yet. Will do shortly. |
This now explains the semantics for negative window bounds.
I've updated the docs in |
/ok to test |
/merge |
Description
This commit adds support for "offset" ROW windows, where the preceding and following
window bounds are allowed to have negative values. This allows window definitions to
exclude the current row entirely.
Prior to this change, ROW-based windows had to include the current row, causing
preceding
andfollowing
to support only non-negative values. Additionally, theinclusion of the current row would count against the
min_periods
check.The following is an example of the new "negative" semantics. Consider the input:
If the window bounds are specified as (preceding=3, following=-1), then the window
for the third row (
3
) is{1, 2}
.following=-1
indicates a "following" row before the current row.A negative value for
preceding
follows the existing convention of including thecurrent row. This makes it slightly more involved:
preceding=2
indicates one row before the current row.preceding=1
indicates the current row.preceding=0
indicates one row past (i.e. after) the current row.preceding=-1
indicates two rows after the current row.Et cetera.
min_periods
checks continue to be honoured as before, but the requirement forpositive
min_periods
is dropped.min_periods
only need be non-negative.Checklist