Releases: tidyverse/dplyr
dplyr 1.0.5
-
Fixed edge case of
slice_sample()
whenweight_by=
is used and there
0 rows (#5729). -
across()
can again use columns in functions defined inline (#5734). -
Using testthat 3rd edition.
-
Fixed bugs introduced in
across()
in previous version (#5765). -
group_by()
keeps attributes unrelated to the grouping (#5760). -
The
.cols=
argument ofif_any()
andif_all()
defaults toeverything()
.
dplyr 1.0.4
-
Improved performance for
across()
. This makessummarise(across())
and
mutate(across())
perform as well as the superseded colwise equivalents (#5697). -
summarise()
silently ignores NULL results (#5708). -
Fixed a performance regression in
mutate()
when warnings occur once per
group (#5675). We no longer instrument warnings with debugging information
whenmutate()
is called withinsuppressWarnings()
.
dplyr 1.0.3
-
summarise()
no longer informs when the result is ungrouped (#5633). -
group_by(.drop = FALSE)
preserves ordered factors (@brianrice2, #5545). -
count()
andtally()
are now generic. -
Removed default fallbacks to lazyeval methods; this will yield better error messages when
you call a dplyr function with the wrong input, and is part of our long term
plan to remove the deprecated lazyeval interface. -
inner_join()
gains akeep
parameter for consistency with the other
mutating joins (@patrickbarks, #5581). -
Improved performance with many columns, with a dynamic data mask using active
bindings and lazy chops (#5017). -
mutate()
and friends preserves row names in data frames once more (#5418). -
group_by()
uses the ungrouped data for the implicit mutate step (#5598).
You might have to define anungroup()
method for custom classes.
For example, see hadley/cubelyr#3. -
relocate()
can rename columns it relocates (#5569). -
distinct()
andgroup_by()
have better error messages when the mutate step fails (#5060). -
Clarify that
between()
is not vectorised (#5493). -
Fixed
across()
issue where data frame columns would could not be referred to
withall_of()
in the nested case (mutate()
withinmutate()
) (#5498). -
across()
handles data frames with 0 columns (#5523). -
mutate()
always keeps grouping variables, unconditional to.keep=
(#5582). -
dplyr now depends on R 3.3.0
dplyr 1.0.2
dplyr 1.0.1
-
New function
cur_data_all()
similar tocur_data()
but includes the grouping variables (#5342). -
count()
andtally()
no longer automatically weights by columnn
if
present (#5298). dplyr 1.0.0 introduced this behaviour because of Hadley's
faulty memory. Historicallytally()
automatically weighted andcount()
did not, but this behaviour was accidentally changed in 0.8.2 (#4408) so that
neither automatically weighted byn
. Since 0.8.2 is almost a year old,
and the automatically weighting behaviour was a little confusing anyway,
we've removed it from bothcount()
andtally()
.Use of
wt = n()
is now deprecated; now just omit thewt
argument. -
coalesce()
now supports data frames correctly (#5326). -
cummean()
no longer has off-by-one indexing problem (@Cropgen, #5287). -
The call stack is preserved on error. This makes it possible to
recover()
into problematic code called from dplyr verbs (#5308).
dplyr 1.0.0
Breaking changes
-
bind_cols()
no longer converts to a tibble, returns a data frame if the input is a data frame. -
bind_rows()
,*_join()
,summarise()
andmutate()
use vctrs coercion
rules. There are two main user facing changes:-
Combining factor and character vectors silently creates a character
vector; previously it created a character vector with a warning. -
Combining multiple factors creates a factor with combined levels;
previously it created a character vector with a warning.
-
-
bind_rows()
and other functions use vctrs name repair, see?vctrs::vec_as_names
. -
all.equal.tbl_df()
removed.-
Data frames, tibbles and grouped data frames are no longer considered equal, even if the data is the same.
-
Equality checks for data frames no longer ignore row order or groupings.
-
expect_equal()
usesall.equal()
internally. When comparing data frames, tests that used to pass may now fail.
-
-
distinct()
keeps the original column order. -
distinct()
on missing columns now raises an error, it has been a compatibility warning for a long time. -
group_modify()
puts the grouping variable to the front. -
n()
androw_number()
can no longer be called directly when dplyr is not loaded,
and this now generates an error:dplyr::mutate(mtcars, x = n())
.Fix by prefixing with
dplyr::
as indplyr::mutate(mtcars, x = dplyr::n())
-
The old data format for
grouped_df
is no longer supported. This may affect you if you have serialized grouped data frames to disk, e.g. withsaveRDS()
or when using knitr caching. -
lead()
andlag()
are stricter about their inputs. -
Extending data frames requires that the extra class or classes are added first, not last.
Having the exta class at the end causes some vctrs operations to fail with a mesage like:Input must be a vector, not a `<data.frame/...>` object
-
right_join()
no longer sorts the rows of the resulting tibble according to the order of the RHSby
argument in tibbley
.
New features
-
The
cur_
functions (cur_data()
,cur_group()
,cur_group_id()
,
cur_group_rows()
) provide a full set of options to you access information
about the "current" group in dplyr verbs. They are inspired by
data.table's.SD
,.GRP
,.BY
, and.I
. -
The
rows_
functions (rows_insert()
,rows_update()
,rows_upsert()
,rows_patch()
,rows_delete()
) provide a new API to insert and delete rows from a second data frame or table. Support for updating mutable backends is planned (#4654). -
mutate()
andsummarise()
create multiple columns from a single expression
if you return a data frame (#2326). -
select()
andrename()
use the latest version of the tidyselect interface.
Practically, this means that you can now combine selections using Boolean
logic (i.e.!
,&
and|
), and use predicate functions withwhere()
(e.g.where(is.character)
) to select variables by type (#4680). It also makes
it possible to useselect()
andrename()
to repair data frames with
duplicated names (#4615) and prevents you from accidentally introducing
duplicate names (#4643). This also means that dplyr now re-exportsany_of()
andall_of()
(#5036). -
slice()
gains a new set of helpers:-
slice_head()
andslice_tail()
select the first and last rows, like
head()
andtail()
, but returnn
rows per group. -
slice_sample()
randomly selects rows, taking over fromsample_frac()
andsample_n()
. -
slice_min()
andslice_max()
select the rows with the minimum or
maximum values of a variable, taking over from the confusingtop_n()
.
-
-
summarise()
can create summaries of greater than length 1 if you use a
summary function that returns multiple values. -
summarise()
gains a.groups=
argument to control the grouping structure. -
New
relocate()
verb makes it easy to move columns around within a data
frame (#4598). -
New
rename_with()
is designed specifically for the purpose of renaming
selected columns with a function (#4771). -
ungroup()
can now selectively remove grouping variables (#3760). -
pull()
can now return named vectors by specifying an additional column name
(@ilarischeinin, #4102).
Experimental features
-
mutate()
(for data frames only), gains experimental new arguments
.before
and.after
that allow you to control where the new columns are
placed (#2047). -
mutate()
(for data frames only), gains an experimental new argument
called.keep
that allows you to control which variables are kept from
the input.data
..keep = "all"
is the default; it keeps all variables.
.keep = "none"
retains no input variables (except for grouping keys),
so behaves liketransmute()
..keep = "unused"
keeps only variables
not used to make new columns..keep = "used"
keeps only the input variables
used to create new columns; it's useful for double checking your work (#3721). -
New, experimental,
with_groups()
makes it easy to temporarily group or
ungroup (#4711).
across()
-
New function
across()
that can be used insidesummarise()
,mutate()
,
and other verbs to apply a function (or a set of functions) to a selection of
columns. Seevignette("colwise")
for more details. -
New function
c_across()
that can be used insidesummarise()
andmutate()
in row-wise data frames to easily (e.g.) compute a row-wise mean of all
numeric variables. Seevignette("rowwise")
for more details.
rowwise()
-
rowwise()
is no longer questioning; we now understand that it's an
important tool when you don't have vectorised code. It now also allows you to
specify additional variables that should be preserved in the output when
summarising (#4723). The rowwise-ness is preserved by all operations;
you need to explicit drop it withas_tibble()
orgroup_by()
. -
New, experimental,
nest_by()
. It has the same interface asgroup_by()
,
but returns a rowwise data frame of grouping keys, supplemental with a
list-column of data frames containing the rest of the data.
vctrs
-
The implementation of all dplyr verbs have been changed to use primitives
provided by the vctrs package. This makes it easier to add support for
new types of vector, radically simplifies the implementation, and makes
all dplyr verbs more consistent. -
The place where you are mostly likely to be impacted by the coercion
changes is when working with factors in joins or grouped mutates:
now when combining factors with different levels, dplyr creates a new
factor with the union of the levels. This matches base R more closely,
and while perhaps strictly less correct, is much more convenient. -
dplyr dropped its two heaviest dependencies: Rcpp and BH. This should make
it considerably easier and faster to build from source. -
The implementation of all verbs has been carefully thought through. This
mostly makes implementation simpler but should hopefully increase consistency,
and also makes it easier to adapt to dplyr to new data structures in the
new future. Pragmatically, the biggest difference for most people will be
that each verb documents its return value in terms of rows, columns, groups,
and data frame attributes. -
Row names are now preserved when working with data frames.
Grouping
-
group_by()
uses hashing from thevctrs
package. -
Grouped data frames now have
names<-
,[[<-
,[<-
and$<-
methods that
re-generate the underlying grouping. Note that modifying grouping variables
in multiple steps (i.e.df$grp1 <- 1; df$grp2 <- 1
) will be inefficient
since the data frame will be regrouped after each modification. -
[.grouped_df
now regroups to respect any grouping columns that have
been removed (#4708). -
mutate()
andsummarise()
can now modify grouping variables (#4709). -
group_modify()
works with additional arguments (@billdenney and @cderv, #4509) -
group_by()
does not create an arbitrary NA group when grouping by factors
withdrop = TRUE
(#4460).
Lifecycle changes
- All deprecations now use the lifecycle,
that means by default you'll only see a deprecation warning once per session,
and you can control withoptions(lifecycle_verbosity = x)
where
x
is one of NULL, "quiet", "warning", and "error".
Removed
-
id()
, deprecated in dplyr 0.5.0, is now defunct. -
failwith()
, deprecated in dplyr 0.7.0, is now defunct. -
tbl_cube()
andnasa
have been pulled out into a separate cubelyr package
(#4429). -
rbind_all()
andrbind_list()
have been removed (@bjungbogati, #4430). -
dr_dplyr()
has been removed as it is no longer needed (#4433, @smwindecker).
Deprecated
-
Use of pkgconfig for setting
na_matches
argument to join functions is now
deprecated (#4914). This was rarely used, and I'm now confident that the
default is correct for R. -
In
add_count()
, thedrop
argument has been deprecated because it didn't
actually affect the output. -
add_rownames()
: please usetibble::rownames_to_column()
instead. -
as.tbl()
andtbl_df()
: please useas_tibble()
instead. -
bench_tbls()
,compare_tbls()
,compare_tbls2()
,eval_tbls()
and
eval_tbls2()
are now deprecated. That were only used in a handful of
packages, and we now believe that you're better off performing comparisons
more directly (#4675). -
combine()
: please usevctrs::vec_c()
instead. -
funs()
: please uselist()
instead. -
group_by(add = )
: please use.add
instead. -
group_by(.dots = )
/group_by_prepare(.dots = )
: please use!!!
inst...
v0.8.5
dplyr 0.8.4
- Adapt tests to changes in dependent packages.
dplyr 0.8.3
- Fixed performance regression introduced in version 0.8.2 (#4458).
dplyr 0.8.2
New functions
top_frac(data, proportion)
is a shorthand fortop_n(data, proportion * n())
(#4017).
colwise changes
-
Using quosures in colwise verbs is deprecated (#4330).
-
Updated
distinct_if()
,distinct_at()
anddistinct_all()
to include.keep_all
argument (@beansrowning, #4343). -
rename_at()
handles empty selection (#4324). -
*_if()
functions correctly handle columns with special names (#4380). -
colwise functions support constants in formulas (#4374).
Hybrid evaluation changes
-
hybrid rank functions correctly handle NA (#4427).
-
first()
,last()
andnth()
hybrid version handles factors (#4295).
Minor changes
-
top_n()
quotes itsn
argument,n
no longer needs to be constant for all groups (#4017). -
tbl_vars()
keeps information on grouping columns by returning adplyr_sel_vars
object (#4106). -
group_split()
always sets theptype
attribute, which make it more robust in the case where there
are 0 groups. -
group_map()
andgroup_modify()
work in the 0 group edge case (#4421) -
select.list()
method added so thatselect()
does not dispatch on lists (#4279). -
view()
is reexported from tibble (#4423). -
group_by()
puts NA groups last in character vectors (#4227). -
arrange()
handles integer64 objects (#4366). -
summarise()
correctly resolves summarised list columns (#4349).