janitor 2.2.0 #528

sfirke · 2023-02-03T16:19:07Z

sfirke
Feb 3, 2023
Maintainer

Breaking changes

These are all minor breaking changes resulting from enhancements and are not expected to affect the vast majority of users.

A new ... argument was added to row_to_names(), preceding the remove_row argument, as part of the new find_header() functionality. If code previously used remove_row as an unnamed argument, it will now error. If code previously used the unsupported behavior of passing anything other than TRUE or FALSE to remove_row, unexpected results may occur.
Microsoft Excel incorrectly has a leap day on 29 February 1900 (see https://docs.microsoft.com/en-us/office/troubleshoot/excel/wrongly-assumes-1900-is-leap-year). excel_numeric_to_date() did not account for this error, and now it does. Dates returned from excel_numeric_to_date() that precede 1 March 1900 will now be one day later compared to previous versions (i.e. what was 1 Feb 1900 is now 2 Feb 1900), and dates that Excel presents as 29 Feb 1900 will become as.POSIXct(NA). (Excel dates before the era begins do not exist (but they do with excel_date_to_numeric()) #423, thanks @billdenney for fixing)
A minor breaking change is that the time zone is now always set for excel_numeric_to_date() and convert_date(). The default timezone is Sys.timezone(), previously it was an empty string (""). (Excel leap day bug #422, thanks @billdenney for fixing)
get_dupes() results are now sorted first by descending order of dupe_count, then alphabetically by sorting variables. (Suggestion: get_dupes results should be sorted by dupe_count descending #493)
There are several minor breaking changes resulting from enhancements to adorn_ns():
- The addition of the new argument format_func means that previous calls relying on ,,, as shorthand to get to the ... column selection argument may now require an extra comma.
- adorn_ns() now defaults to displaying numbers of >3 digits with big.mark = ",", as part of the default value of the new format_func argument. E.g., 1234 is now 1,234.
- adorn_ns() no longer prints leading whitespace when position = "front" - this is not a visible change in the printed result and it would be rare that this affects any code.
When the first column of the data.frame input to adorn_totals() is a factor and a totals row is added to the bottom, that column now remains a factor, with "Total" or other user-specified totals name added to its factor levels (Add adorn_totals as level in existing factor variable #494).

New features

row_to_names() now has a new helper function, find_header() to help find the row that contains the names. It can be used by passing row_number="find_header". See the documentation of row_to_names() and find_header() for more examples. (fix Option for row_to_names to find the first complete row of names #429)
remove_empty() has a new argument, cutoff which allows rows or columns to be removed if at least the cutoff fraction of the data are missing. (fix Feature suggestion: function to remove columns based on missing percentage #446, thanks to @jzadra for suggesting the feature and @billdenney for fixing)
A new function sas_numeric_to_date() has been added to convert SAS dates, times, and datetimes to R objects (fix Feature request: sas_numeric_to_date() function for SAS files #475, thanks to @billdenney for suggesting and implementing)
A new function single_value() has been added to ensure that only a single value or missing values are present in a vector (fix Feature Request: single_value() #428)
A new function get_one_to_one() has been added to find columns that map 1:1 to each other, even if the values within the columns differ (fix Feature Request: Detect Columns that Map to Each Other #291, @billdenney)
adorn_Ns() contains a new format_func argument so that the user can format the Ns to their liking, e.g., changing the big.mark character. (Feature request: adorn_n_formatting() #444)
clean_names() can now be called on database connection in a dbplyr code pipeline (Write a dbplyr version of clean_names? #467)

Minor features

make_clean_names() (and therefore clean_names()) issues a warning if the mu or micro symbol is in the names and it is not or may not be handled by a replace argument value. (avoid collision of µg and mg becoming mg in make_clean_names #448, thanks @IndrajeetPatil for reporting and @billdenney for fixing) The rationale is that standard transliteration would convert "[mu]g" to "mg" when it would be more typically be converted to "ug" for use as a unit. A new, unexported constant (janitor:::mu_to_u) was added to help with mu to "u" replacements.
excel_numeric_to_date() now warns when times are converted to NA due to hours that do not exist because of daylight savings time (fix excel_numeric_to_date gives NA output for times in daylight savings time gap #420, thanks @Geomorph2 for reporting and @billdenney for fixing). It also warns when inputs are not positive, since Excel only supports values down to 1 (Excel dates before the era begins do not exist (but they do with excel_date_to_numeric()) #423).
If a tabyl() or similar data.frame is sorted (e.g., with dplyr::arrange()), then has adorn_totals() and/or adorn_percentages() called on it, followed by adorn_ns(), the Ns will be sorted correctly to match the tabyl they're being adorned on. (fix Handle mismatch between tabyl and its core after it is arranged #407)
clean_names() now supports all object types that have either names or dimnames (FR: support all classes by default in clean_names() #481, @DanChaltiel).
adorn_pct_formatting() uses the locale-dependent value of decimal.mark as a decimal separator, e.g., in locales where getOption("OutDec") is , it will print percentages in the format "12,34%". This character can also be set manually with options(OutDec = ",").(feature suggestion: parameter to specify decimal separator on adorn_pct_formatting #451).
adorn_totals(where ="row") now preserves factor class and levels of the first column of the input data.frame (Add adorn_totals as level in existing factor variable #494).
make_clean_names() now allows for duplicate names to be returned by specifying TRUE to the new allow_dupes argument (Feature request: allow users to opt-in for duplicate names in make_clean_names() #495, @JasonAizkalns).
Some warning messages now have classes so that they can be specifically suppressed with suppressWarnings(..., class="the_class_to_suppress"). To find the class of a warning you typically must look at the code where the error is occurring. (suggestion: option to silence warnings in row_to_names #452, thanks to @mgacc0 for suggesting and @billdenney for fixing)

Bug fixes

adorn_percentages() was refactored for compatibility with dplyr package versions >= 1.1.0 (bug in adorn_percentages(denominator = "col") #490)
When a numeric variable is supplied as the 2nd variable (column) or 3rd variable (list) of a tabyl, the resulting columns or list are now sorted in numeric order, not alphabetic. (tabyl sorts the table column by the integer 'name' not the number #438, thanks @daaronr for reporting and @mattroumaya for fixing)
tabyl() now succeeds when the second variable is named "n" (Two-way tabyl where the column 'n' is being spread yields weird result #445).
adorn_ns() can act on a single-column data.frame input with custom Ns supplied if the variable to adorn is specified with ... (Trouble with adorning Ns on a single column #456).
adorn_totals() on a one_way tabyl preserves the tabyl_type attribute so that a subsequent call to adorn_pct_formatting() works correctly on one-way tabyls (Bug: adorn_totals() on one-way tabyl changes attribute to two_way #523).

This discussion was created from the release janitor 2.2.0.

sfirke · 2023-02-03T16:59:08Z

sfirke
Feb 3, 2023
Maintainer Author

If anyone has any feedback on 2.2.0, feel free to post it here 😅

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

janitor 2.2.0 #528

{{title}}

Replies: 1 comment

{{title}}

Select a reply

janitor 2.2.0 #528

sfirke Feb 3, 2023 Maintainer

Breaking changes

New features

Minor features

Bug fixes

Replies: 1 comment

sfirke Feb 3, 2023 Maintainer Author

sfirke
Feb 3, 2023
Maintainer

sfirke
Feb 3, 2023
Maintainer Author