Release dplyr 0.4.3 · tidyverse/dplyr

Improved encoding support

Until now, dplyr's support for non-UTF8 encodings has been rather shaky. This release brings a number of improvement to fix these problems: it's probably not perfect, but should be a lot better than the previously version. This includes fixes to arrange() (#1280), bind_rows() (#1265), distinct() (#1179), and joins (#1315). print.tbl_df() also recieved a fix for strings with invalid encodings (#851).

Other minor improvements and bug fixes

frame_data() provides a means for constructing data_frames using
a simple row-wise language. (#1358, @kevinushey)
all.equal() no longer runs all outputs together (#1130).
as_data_frame() gives better error message with NA column names (#1101).
[.tbl_df is more careful about subsetting column names (#1245).
arrange() and mutate() work on empty data frames (#1142).
arrange(), filter(), slice(), and summarise() preserve data frame
meta attributes (#1064).
bind_rows() and bind_cols() accept lists (#1104): during initial data
cleaning you no longer need to convert lists to data frames, but can
instead feed them to bind_rows() directly.
bind_rows() gains a .id argument. When supplied, it creates a
new column that gives the name of each data frame (#1337, @lionel-).
bind_rows() respects the ordered attribute of factors (#1112), and
does better at comparing POSIXcts (#1125). The tz attribute is ignored
when determining if two POSIXct vectors are comparable. If the tz of
all inputs is the same, it's used, otherwise its set to UTC.
data_frame() always produces a tbl_df (#1151, @kevinushey)
filter(x, TRUE, TRUE) now just returns x (#1210),
it doesn't internally modify the first argument (#971), and
it now works with rowwise data (#1099). It once again works with
data tables (#906).
glimpse() also prints out the number of variables in addition to the number
of observations (@ilarischeinin, #988).
Joins handles matrix columns better (#1230), and can join Date objects
with heterogenous representations (some Dates are integers, while other
are numeric). This also improves all.equal() (#1204).
Fixed percent_rank() and cume_dist() so that missing values no longer
affect denominator (#1132).
print.tbl_df() now displays the class for all variables, not just those
that don't fit on the screen (#1276). It also displays duplicated column
names correctly (#1159).
print.grouped_df() now tells you how many groups there are.
mutate() can set to NULL the first column (used to segfault, #1329) and
it better protects intermediary results (avoiding random segfaults, #1231).
mutate() on grouped data handles the special case where for the first few
groups, the result consists of a logical vector with only NA. This can
happen when the condition of an ifelse is an all NA logical vector (#958).
mutate.rowwise_df() handles factors (#886) and correctly handles
0-row inputs (#1300).
n_distinct() gains an na_rm argument (#1052).
The Progress bar used by do() now respects global option
dplyr.show_progress (default is TRUE) so you can turn it off globally
(@jimhester #1264, #1226).
summarise() handles expressions that returning heterogenous outputs,
e.g. median(), which that sometimes returns an integer, and other times a
numeric (#893).
slice() silently drops columns corresponding to an NA (#1235).
ungroup.rowwise_df() gives a tbl_df (#936).
More explicit duplicated column name error message (#996).
When "," is already being used as the decimal point (getOption("OutDec")),
use "." as the thousands separator when printing out formatted numbers
(@ilarischeinin, #988).

Databases

db_query_fields.SQLiteConnection uses build_sql rather than paste0
(#926, @NikNakk)
Improved handling of log() (#1330).
n_distinct(x) is translated to COUNT(DISTINCT(x)) (@skparkes, #873).
print(n = Inf) now works for remote sources (#1310).

Hybrid evaluation

Hybrid evaluation does not take place for objects with a class (#1237).
Improved $ handling (#1134).
Simplified code for lead() and lag() and make sure they work properly on
factors (#955). Both repsect the default argument (#915).
mutate can set to NULL the first column (used to segfault, #1329).
filter on grouped data handles indices correctly (#880).
sum() issues a warning about integer overflow (#1108).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dplyr 0.4.3

Improved encoding support

Other minor improvements and bug fixes

Databases

Hybrid evaluation