-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mutate(.by_row =)
, reframe(.by_row =)
, and possibly filter(.by_row =)
#6660
Comments
Sounds good! |
I like this a lot. Reading the first part, I thought about a So, now wandering in another direction, which I know is a bit silly, but what if |
I like the idea of automatically wrapping scalars in a list. This is the sort of things that vctrs makes possible in a predictable and consistent manner. However, I feel like we should commit to the argument syntax of So in this case I'd like us to consider using an argument. It could be a simple boolean: df |> mutate(foo(bar), .by = baz) # By group
df |> mutate(foo(bar), .by_rows = TRUE) # By row We could also add a variant of # Like `.by_row` but `[` subsetting
df |> mutate(foo(bar), .by_vector = 1:n())
df |> summarise(foo(bar), .by_vector = cut(baz, 3)) In this case we'd end up with a trio of complementary arguments that change the semantics of evaluation: I think using modifiers instead of variants fits the general evolution of the dplyr API, e.g. we've removed the suffixed variants of the verbs in favour of |
I'd be open to I'm also slightly more empathetic to the idea of also adding this to |
mutate_row()
and reframe_row()
mutate(.by_row =)
, reframe(.by_row =)
, and possibly filter(.by_row =)
This comment was marked as resolved.
This comment was marked as resolved.
I see that my suggestion for allowing Anyway, just wanted to voice this. In the end I trust your judgment and will hold my peace regarding this issue forevermore. Thank for the ongoing dedication to and improvement of |
Maybe Currently if there is a column which is unique I will use that or if I am sure that there are no duplicate rows then |
Related to #4723
With the introduction of
.by
, it seems reasonable to once again reconsiderrowwise()
as well. I think we are convinced that the idea of rowwise is useful, but the implementation could possibly be improved. A few pain points:rowwise()
is a form of persistent grouping, but you rarely want it on for more than 1 operationungroup()
is an odd verb for turning off rowwise behaviorsummarise(model = list(lm(...)))
, i.e. thelist()
wrapping is manualrowwise_df
class is difficult and error prone for usmutate()
andreframe()
.With that in mind, I'd like to suggest a two-part replacement for
rowwise()
:mutate_row()
andreframe_row()
. These become the only two places in dplyr where rowwise behavior is applicable.mutate()
,summarise()
,reframe()
,mutate_row()
, andreframe_row()
the ability to automatically wrap scalars in a list. i.e. ifvec_is(elt)
isFALSE
, wrap automatically into a list. This means that value could never exist in a data frame column as is, so there is no ambiguity about wrapping and it is fairly easy to explain.Those two proposals result in the following new patterns:
This two part proposal has the very nice property that the difference between
mutate()
andmutate_row()
becomes purely about column access:mutate()
accesses columns usingvec_slice()
/[
mutate_row()
accesses columns usingvec_slice2()
/[[
In other words, rowwise has nothing to do with the output type of each column expression, and you still get useful results.
In terms of other invariants, there is one related to
vec_size()
:mutate_row()
requires each expression to return an element ofvec_size() == 1
reframe_row()
allows each expression to return an element of any sizeOther niceties:
.by
being in the verb)Extra notes:
mutate_row()
andreframe_row()
won't get.by
because they operation "by row".by
about rowwise behavior, like.by = .row
or something. We want.by
to be pure tidyselect. Plus this special behavior would only apply formutate()
andreframe()
and that would be very confusing.summarise_row()
. This would have the exact same semantics asmutate_row()
, but would just drop unused columns (which can mostly be done with.keep
inmutate_row()
). In particularsummarise_row()
andmutate_row()
would both have to have thevec_size() == 1
invariant from above, so we really don't need both.filter_row()
. The only useful thing I can think of is something likefilter_row(!is.null(model))
for filtering outNULL
list elements. But you can do that way more efficiently with an ungrouped call tofilter(!funs::is_na(model))
.mutate_row()
andreframe_row()
mostly have the semantics of the wrappers below, but this doesn't do the automatic list-wrapping of scalars:The text was updated successfully, but these errors were encountered: