You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
between() vector function efficiently determines if numeric values fall
in a range, and is translated to special form for SQL (#503).
count() makes it even easier to do (weighted) counts (#358).
data_frame() by @kevinushey is a nicer way of creating data frames.
It never coerces column types (no more stringsAsFactors = FALSE!),
never munges column names, and never adds row names. You can use previously
defined columns to compute new columns (#376).
distinct() returns distinct (unique) rows of a tbl (#97). Supply
additional variables to return the first row for each unique combination
of variables.
Set operations, intersect(), union() and setdiff() now have methods
for data frames, data tables and SQL database tables (#93). They pass their
arguments down to the base functions, which will ensure they raise errors if
you pass in two many arguments.
Joins (e.g. left_join(), inner_join(), semi_join(), anti_join())
now allow you to join on different variables in x and y tables by
supplying a named vector to by. For example, by = c("a" = "b") joins x.a to y.b.
n_groups() function tells you how many groups in a tbl. It returns
1 for ungrouped data. (#477)
transmute() works like mutate() but drops all variables that you didn't
explicitly refer to (#302).
rename() makes it easy to rename variables - it works similarly to select() but it preserves columns that you didn't otherwise touch.
slice() allows you to selecting rows by position (#226). It includes
positive integers, drops negative integers and you can use expression like n().
Programming with dplyr (non-standard evaluation)
You can now program with dplyr - every function that does non-standard
evaluation (NSE) has a standard evaluation (SE) version ending in _.
This is powered by the new lazyeval package which provides all the tools
needed to implement NSE consistently and correctly.
See vignette("nse") for full details.
regroup() is deprecated. Please use the more flexible group_by_()
instead.
summarise_each_q() and mutate_each_q() are deprecated. Please use summarise_each_() and mutate_each_() instead.
funs_q has been replaced with funs_.
Removed and deprecated features
%.% has been deprecated: please use %>% instead. chain() is
defunct. (#518)
filter.numeric() removed. Need to figure out how to reimplement with
new lazy eval system.
The Progress refclass is no longer exported to avoid conflicts with shiny.
Instead use progress_estimated() (#535).
src_monetdb() is now implemented in MonetDB.R, not dplyr.
show_sql() and explain_sql() and matching global options dplyr.show_sql
and dplyr.explain_sql have been removed. Instead use show_query() and explain().
Minor improvements and bug fixes
Main verbs now have individual documentation pages (#519).
%>% is simply re-exported from magrittr, instead of creating a local copy
(#496, thanks to @jimhester)
Examples now use nycflights13 instead of hflights because it the variables
have better names and there are a few interlinked tables (#562). Lahman and nycflights13 are (once again) suggested packages. This means many examples
will not work unless you explicitly install them with install.packages(c("Lahman", "nycflights13")) (#508). dplyr now depends on
Lahman 3.0.1. A number of examples have been updated to reflect modified
field names (#586).
do() now displays the progress bar only when used in interactive prompts
and not when knitting (#428, @jimhester).
group_by() has more consistent behaviour when grouping by constants:
it creates a new column with that value (#410). It renames grouping
variables (#410). The first argument is now .data so you can create
new groups with name x (#534).
Now instead of overriding lag(), dplyr overrides lag.default(),
which should avoid clobbering lag methods added by other packages.
(#277).
mutate(data, a = NULL) removes the variable a from the returned
dataset (#462).
trunc_mat() and hence print.tbl_df() and friends gets a width argument
to control the deafult output width. Set options(dplyr.width = Inf) to
always show all columns (#589).
select() gains one_of() selector: this allows you to select variables
provided by a character vector (#396). It fails immediately if you give an
empty pattern to starts_with(), ends_with(), contains() or matches()
(#481, @leondutoit). Fixed buglet in select() so that you can now create
variables called val (#564).
Switched from RC to R6.
tally() and top_n() work consistently: neither accidentally
evaluates the the wt param. (#426, @mnel)
The db backend system has been completely overhauled in order to make
it possible to add backends in other packages, and to support a much
wider range of databases. See vignette("new-sql-backend") for instruction
on how to create your own (#568).
src_mysql() gains a method for explain().
When mutate() creates a new variable that uses a window function,
automatically wrap the result in a subquery (#484).
Correct SQL generation for first() and last() (#531).
order_by() now works in conjunction with window functions in databases
that support them.
Data frames/tbl_df
All verbs now understand how to work with difftime() (#390) and AsIs (#453) objects. They all check that colnames are unique (#483), and
are more robust when columns are not present (#348, #569, #600).
Hybrid evaluation bugs fixed:
Call substitution stopped too early when a sub expression contained a $ (#502).
nth() now correctly preserve the class when using dates, times and
factors (#509).
no longer substitutes within order_by() because order_by() needs to do
its own NSE (#169).
[.tbl_df always returns a tbl_df (i.e. drop = FALSE is the default)
(#587, #610). [.grouped_df preserves important output attributes (#398).
arrange() keeps the grouping structure of grouped data (#491, #605),
and preserves input classes (#563).
contains() accidentally matched regular expressions, now it passes fixed = TRUE to grep() (#608).
filter() asserts all variables are white listed (#566).
mutate() makes a rowwise_df when given a rowwise_df (#463).
rbind_all() creates tbl_df objects instead of raw data.frames.
If select() doesn't match any variables, it returns a 0-column data frame,
instead of the original (#498). It no longer fails when if some columns
are not named (#492)
sample_n() and sample_frac() methods for data.frames exported.
(#405, @alyst)
A grouped data frame may have 0 groups (#486). Grouped df objects
gain some basic validity checking, which should prevent some crashes
related to corrupt grouped_df objects made by rbind() (#606).
More coherence when joining columns of compatible but different types,
e.g. when joining a character vector and a factor (#455),
or a numeric and integer (#450)
mutate() works for on zero-row grouped data frame, and
with list columns (#555).
�LazySubset was confused about input data size (#452).
Internal n_distinct() is stricter about it's inputs: it requires one symbol
which must be from the data frame (#567).
rbind_*() handle data frames with 0 rows (#597). They fill character
vector columns with NA instead of blanks (#595). They work with
list columns (#463).
Improved handling of encoding for column names (#636).
Improved handling of hybrid evaluation re $ and @ (#645).
Data tables
Fix major omission in tbl_dt() and grouped_dt() methods - I was
accidentally doing a deep copy on every result :(
summarise() and group_by() now retain over-allocation when working with
data.tables (#475, @arunsrinivasan).
joining two data.tables now correctly dispatches to data table methods,
and result is a data table (#470)
Cubes
summarise.tbl_cube() works with single grouping variable (#480).