dplyr 0.4.3
Improved encoding support
Until now, dplyr's support for non-UTF8 encodings has been rather shaky. This release brings a number of improvement to fix these problems: it's probably not perfect, but should be a lot better than the previously version. This includes fixes to arrange()
(#1280), bind_rows()
(#1265), distinct()
(#1179), and joins (#1315). print.tbl_df()
also recieved a fix for strings with invalid encodings (#851).
Other minor improvements and bug fixes
frame_data()
provides a means for constructingdata_frame
s using
a simple row-wise language. (#1358, @kevinushey)all.equal()
no longer runs all outputs together (#1130).as_data_frame()
gives better error message with NA column names (#1101).[.tbl_df
is more careful about subsetting column names (#1245).arrange()
andmutate()
work on empty data frames (#1142).arrange()
,filter()
,slice()
, andsummarise()
preserve data frame
meta attributes (#1064).bind_rows()
andbind_cols()
accept lists (#1104): during initial data
cleaning you no longer need to convert lists to data frames, but can
instead feed them tobind_rows()
directly.bind_rows()
gains a.id
argument. When supplied, it creates a
new column that gives the name of each data frame (#1337, @lionel-).bind_rows()
respects theordered
attribute of factors (#1112), and
does better at comparingPOSIXct
s (#1125). Thetz
attribute is ignored
when determining if twoPOSIXct
vectors are comparable. If thetz
of
all inputs is the same, it's used, otherwise its set toUTC
.data_frame()
always produces atbl_df
(#1151, @kevinushey)filter(x, TRUE, TRUE)
now just returnsx
(#1210),
it doesn't internally modify the first argument (#971), and
it now works with rowwise data (#1099). It once again works with
data tables (#906).glimpse()
also prints out the number of variables in addition to the number
of observations (@ilarischeinin, #988).- Joins handles matrix columns better (#1230), and can join
Date
objects
with heterogenous representations (someDate
s are integers, while other
are numeric). This also improvesall.equal()
(#1204). - Fixed
percent_rank()
andcume_dist()
so that missing values no longer
affect denominator (#1132). print.tbl_df()
now displays the class for all variables, not just those
that don't fit on the screen (#1276). It also displays duplicated column
names correctly (#1159).print.grouped_df()
now tells you how many groups there are.mutate()
can set toNULL
the first column (used to segfault, #1329) and
it better protects intermediary results (avoiding random segfaults, #1231).mutate()
on grouped data handles the special case where for the first few
groups, the result consists of alogical
vector with onlyNA
. This can
happen when the condition of anifelse
is an allNA
logical vector (#958).mutate.rowwise_df()
handles factors (#886) and correctly handles
0-row inputs (#1300).n_distinct()
gains anna_rm
argument (#1052).- The
Progress
bar used bydo()
now respects global option
dplyr.show_progress
(default is TRUE) so you can turn it off globally
(@jimhester #1264, #1226). summarise()
handles expressions that returning heterogenous outputs,
e.g.median()
, which that sometimes returns an integer, and other times a
numeric (#893).slice()
silently drops columns corresponding to an NA (#1235).ungroup.rowwise_df()
gives atbl_df
(#936).- More explicit duplicated column name error message (#996).
- When "," is already being used as the decimal point (
getOption("OutDec")
),
use "." as the thousands separator when printing out formatted numbers
(@ilarischeinin, #988).
Databases
db_query_fields.SQLiteConnection
usesbuild_sql
rather thanpaste0
(#926, @NikNakk)- Improved handling of
log()
(#1330). n_distinct(x)
is translated toCOUNT(DISTINCT(x))
(@skparkes, #873).print(n = Inf)
now works for remote sources (#1310).
Hybrid evaluation
- Hybrid evaluation does not take place for objects with a class (#1237).
- Improved
$
handling (#1134). - Simplified code for
lead()
andlag()
and make sure they work properly on
factors (#955). Both repsect thedefault
argument (#915). mutate
can set toNULL
the first column (used to segfault, #1329).filter
on grouped data handles indices correctly (#880).sum()
issues a warning about integer overflow (#1108).