Replies: 1 comment
-
If anyone has any feedback on 2.2.0, feel free to post it here 😅 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Breaking changes
These are all minor breaking changes resulting from enhancements and are not expected to affect the vast majority of users.
A new
...
argument was added torow_to_names()
, preceding theremove_row
argument, as part of the newfind_header()
functionality. If code previously usedremove_row
as an unnamed argument, it will now error. If code previously used the unsupported behavior of passing anything other thanTRUE
orFALSE
toremove_row
, unexpected results may occur.Microsoft Excel incorrectly has a leap day on 29 February 1900 (see https://docs.microsoft.com/en-us/office/troubleshoot/excel/wrongly-assumes-1900-is-leap-year).
excel_numeric_to_date()
did not account for this error, and now it does. Dates returned fromexcel_numeric_to_date()
that precede 1 March 1900 will now be one day later compared to previous versions (i.e. what was 1 Feb 1900 is now 2 Feb 1900), and dates that Excel presents as 29 Feb 1900 will becomeas.POSIXct(NA)
. (Excel dates before the era begins do not exist (but they do with excel_date_to_numeric()) #423, thanks @billdenney for fixing)A minor breaking change is that the time zone is now always set for
excel_numeric_to_date()
andconvert_date()
. The default timezone isSys.timezone()
, previously it was an empty string (""
). (Excel leap day bug #422, thanks @billdenney for fixing)get_dupes()
results are now sorted first by descending order ofdupe_count
, then alphabetically by sorting variables. (Suggestion: get_dupes results should be sorted by dupe_count descending #493)There are several minor breaking changes resulting from enhancements to
adorn_ns()
:format_func
means that previous calls relying on,,,
as shorthand to get to the...
column selection argument may now require an extra comma.adorn_ns()
now defaults to displaying numbers of >3 digits withbig.mark = ","
, as part of the default value of the newformat_func
argument. E.g.,1234
is now1,234
.adorn_ns()
no longer prints leading whitespace whenposition = "front"
- this is not a visible change in the printed result and it would be rare that this affects any code.When the first column of the data.frame input to
adorn_totals()
is a factor and a totals row is added to the bottom, that column now remains a factor, with "Total" or other user-specified totals name added to its factor levels (Addadorn_totals
as level in existing factor variable #494).New features
row_to_names()
now has a new helper function,find_header()
to help find the row that contains the names. It can be used by passingrow_number="find_header"
. See the documentation ofrow_to_names()
andfind_header()
for more examples. (fix Option for row_to_names to find the first complete row of names #429)remove_empty()
has a new argument,cutoff
which allows rows or columns to be removed if at least thecutoff
fraction of the data are missing. (fix Feature suggestion: function to remove columns based on missing percentage #446, thanks to @jzadra for suggesting the feature and @billdenney for fixing)A new function
sas_numeric_to_date()
has been added to convert SAS dates, times, and datetimes to R objects (fix Feature request: sas_numeric_to_date() function for SAS files #475, thanks to @billdenney for suggesting and implementing)A new function
single_value()
has been added to ensure that only a single value or missing values are present in a vector (fix Feature Request: single_value() #428)A new function
get_one_to_one()
has been added to find columns that map 1:1 to each other, even if the values within the columns differ (fix Feature Request: Detect Columns that Map to Each Other #291, @billdenney)adorn_Ns()
contains a newformat_func
argument so that the user can format the Ns to their liking, e.g., changing thebig.mark
character. (Feature request: adorn_n_formatting() #444)clean_names()
can now be called on database connection in a dbplyr code pipeline (Write a dbplyr version of clean_names? #467)Minor features
make_clean_names()
(and thereforeclean_names()
) issues a warning if the mu or micro symbol is in the names and it is not or may not be handled by areplace
argument value. (avoid collision of µg and mg becoming mg inmake_clean_names
#448, thanks @IndrajeetPatil for reporting and @billdenney for fixing) The rationale is that standard transliteration would convert"[mu]g"
to"mg"
when it would be more typically be converted to"ug"
for use as a unit. A new, unexported constant (janitor:::mu_to_u) was added to help with mu to "u" replacements.excel_numeric_to_date()
now warns when times are converted toNA
due to hours that do not exist because of daylight savings time (fix excel_numeric_to_date gives NA output for times in daylight savings time gap #420, thanks @Geomorph2 for reporting and @billdenney for fixing). It also warns when inputs are not positive, since Excel only supports values down to 1 (Excel dates before the era begins do not exist (but they do with excel_date_to_numeric()) #423).If a
tabyl()
or similar data.frame is sorted (e.g., withdplyr::arrange()
), then hasadorn_totals()
and/oradorn_percentages()
called on it, followed byadorn_ns()
, the Ns will be sorted correctly to match the tabyl they're being adorned on. (fix Handle mismatch between tabyl and its core after it is arranged #407)clean_names()
now supports all object types that have either names or dimnames (FR: support all classes by default in clean_names() #481, @DanChaltiel).adorn_pct_formatting()
uses the locale-dependent value ofdecimal.mark
as a decimal separator, e.g., in locales wheregetOption("OutDec")
is,
it will print percentages in the format"12,34%"
. This character can also be set manually withoptions(OutDec = ",")
.(feature suggestion: parameter to specify decimal separator on adorn_pct_formatting #451).adorn_totals(where ="row")
now preserves factor class and levels of the first column of the input data.frame (Addadorn_totals
as level in existing factor variable #494).make_clean_names()
now allows for duplicate names to be returned by specifyingTRUE
to the newallow_dupes
argument (Feature request: allow users to opt-in for duplicate names inmake_clean_names()
#495, @JasonAizkalns).Some warning messages now have classes so that they can be specifically suppressed with
suppressWarnings(..., class="the_class_to_suppress")
. To find the class of a warning you typically must look at the code where the error is occurring. (suggestion: option to silence warnings in row_to_names #452, thanks to @mgacc0 for suggesting and @billdenney for fixing)Bug fixes
adorn_percentages()
was refactored for compatibility withdplyr
package versions >= 1.1.0 (bug in adorn_percentages(denominator = "col") #490)When a numeric variable is supplied as the 2nd variable (column) or 3rd variable (list) of a
tabyl
, the resulting columns or list are now sorted in numeric order, not alphabetic. (tabyl sorts the table column by the integer 'name' not the number #438, thanks @daaronr for reporting and @mattroumaya for fixing)tabyl()
now succeeds when the second variable is named"n"
(Two-way tabyl where the column 'n' is being spread yields weird result #445).adorn_ns()
can act on a single-column data.frame input with custom Ns supplied if the variable to adorn is specified with...
(Trouble with adorning Ns on a single column #456).adorn_totals()
on a one_way tabyl preserves thetabyl_type
attribute so that a subsequent call toadorn_pct_formatting()
works correctly on one-way tabyls (Bug: adorn_totals() on one-way tabyl changes attribute to two_way #523).This discussion was created from the release janitor 2.2.0.
Beta Was this translation helpful? Give feedback.
All reactions