Skip to content

dplyr 0.4.3

Compare
Choose a tag to compare
@hadley hadley released this 01 Sep 16:21

Improved encoding support

Until now, dplyr's support for non-UTF8 encodings has been rather shaky. This release brings a number of improvement to fix these problems: it's probably not perfect, but should be a lot better than the previously version. This includes fixes to arrange() (#1280), bind_rows() (#1265), distinct() (#1179), and joins (#1315). print.tbl_df() also recieved a fix for strings with invalid encodings (#851).

Other minor improvements and bug fixes

  • frame_data() provides a means for constructing data_frames using
    a simple row-wise language. (#1358, @kevinushey)
  • all.equal() no longer runs all outputs together (#1130).
  • as_data_frame() gives better error message with NA column names (#1101).
  • [.tbl_df is more careful about subsetting column names (#1245).
  • arrange() and mutate() work on empty data frames (#1142).
  • arrange(), filter(), slice(), and summarise() preserve data frame
    meta attributes (#1064).
  • bind_rows() and bind_cols() accept lists (#1104): during initial data
    cleaning you no longer need to convert lists to data frames, but can
    instead feed them to bind_rows() directly.
  • bind_rows() gains a .id argument. When supplied, it creates a
    new column that gives the name of each data frame (#1337, @lionel-).
  • bind_rows() respects the ordered attribute of factors (#1112), and
    does better at comparing POSIXcts (#1125). The tz attribute is ignored
    when determining if two POSIXct vectors are comparable. If the tz of
    all inputs is the same, it's used, otherwise its set to UTC.
  • data_frame() always produces a tbl_df (#1151, @kevinushey)
  • filter(x, TRUE, TRUE) now just returns x (#1210),
    it doesn't internally modify the first argument (#971), and
    it now works with rowwise data (#1099). It once again works with
    data tables (#906).
  • glimpse() also prints out the number of variables in addition to the number
    of observations (@ilarischeinin, #988).
  • Joins handles matrix columns better (#1230), and can join Date objects
    with heterogenous representations (some Dates are integers, while other
    are numeric). This also improves all.equal() (#1204).
  • Fixed percent_rank() and cume_dist() so that missing values no longer
    affect denominator (#1132).
  • print.tbl_df() now displays the class for all variables, not just those
    that don't fit on the screen (#1276). It also displays duplicated column
    names correctly (#1159).
  • print.grouped_df() now tells you how many groups there are.
  • mutate() can set to NULL the first column (used to segfault, #1329) and
    it better protects intermediary results (avoiding random segfaults, #1231).
  • mutate() on grouped data handles the special case where for the first few
    groups, the result consists of a logical vector with only NA. This can
    happen when the condition of an ifelse is an all NA logical vector (#958).
  • mutate.rowwise_df() handles factors (#886) and correctly handles
    0-row inputs (#1300).
  • n_distinct() gains an na_rm argument (#1052).
  • The Progress bar used by do() now respects global option
    dplyr.show_progress (default is TRUE) so you can turn it off globally
    (@jimhester #1264, #1226).
  • summarise() handles expressions that returning heterogenous outputs,
    e.g. median(), which that sometimes returns an integer, and other times a
    numeric (#893).
  • slice() silently drops columns corresponding to an NA (#1235).
  • ungroup.rowwise_df() gives a tbl_df (#936).
  • More explicit duplicated column name error message (#996).
  • When "," is already being used as the decimal point (getOption("OutDec")),
    use "." as the thousands separator when printing out formatted numbers
    (@ilarischeinin, #988).

Databases

  • db_query_fields.SQLiteConnection uses build_sql rather than paste0
    (#926, @NikNakk)
  • Improved handling of log() (#1330).
  • n_distinct(x) is translated to COUNT(DISTINCT(x)) (@skparkes, #873).
  • print(n = Inf) now works for remote sources (#1310).

Hybrid evaluation

  • Hybrid evaluation does not take place for objects with a class (#1237).
  • Improved $ handling (#1134).
  • Simplified code for lead() and lag() and make sure they work properly on
    factors (#955). Both repsect the default argument (#915).
  • mutate can set to NULL the first column (used to segfault, #1329).
  • filter on grouped data handles indices correctly (#880).
  • sum() issues a warning about integer overflow (#1108).