Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time only POSIXct columns end up a day out #58

Open
sch56 opened this issue Jun 24, 2021 · 3 comments
Open

Time only POSIXct columns end up a day out #58

sch56 opened this issue Jun 24, 2021 · 3 comments

Comments

@sch56
Copy link

sch56 commented Jun 24, 2021

Excel has a legacy of considering 29-Feb-1900 as a leap year day, when it didn't exist.
When I read a column with date/time format and value 0.25 (which appears in Excel time format as 6:00:00 AM) with readxl it creates a POSIXct column.
$ CPUTM_MKPF : POSIXct[1:336433], format: "1899-12-31 06:00:00"
When I write this back out to excel using writexl it is stored as "1900-01-01 06:00:00 UTC" which has the numeric value 1.25.

This can be terribly misleading.

I am not sure if this issue is platform dependent, but readxl appears to have addressed the anomaly. It would be great if writexl was consistent with this.

@jmcnamara
Copy link
Contributor

Libxlsxwriter takes the infamous Excel 1900 leap year into account but writxl does a direct date conversion (for performance I guess) and get bitten by it for dates before 1900-03-01.

I'll submit a fix.

@jmcnamara
Copy link
Contributor

I've pushed a fix in #59.

Here is a small working example:

> df <- data.frame(dates   = c(as.POSIXct("1899-12-31 00:00", 'UTC'),
+                              as.POSIXct("1900-01-01 00:00", 'UTC'),
+                              as.POSIXct("1900-02-28 00:00", 'UTC'),
+                              as.POSIXct("1900-03-01 00:00", 'UTC')),
+                  numbers = c(0, 1, 59, 61),
+                  check   = xl_formula(c("=A2-B2", "=A3-B3", "=A4-B4", "=A5-B5")))
> 
> writexl::write_xlsx(df, 'datetimes.xlsx')
> 
> readxl::read_xlsx('datetimes.xlsx')
# A tibble: 4 x 3
  dates               numbers check
  <dttm>                <dbl> <dbl>
1 1899-12-31 00:00:00       0     0
2 1900-01-01 00:00:00       1     0
3 1900-02-28 00:00:00      59     0
4 1900-03-01 00:00:00      61     0

Output:

screenshot

Without the fix you get this error from read_xlsx:

> readxl::read_xlsx('datetimes.xlsx')
# A tibble: 4 x 3
  dates               numbers check
  <dttm>                <dbl> <dbl>
1 1900-01-01 00:00:00       0     0
2 1900-01-02 00:00:00       1     0
3 NA                       59     0
4 1900-03-01 00:00:00      61     0
Warning message:
In read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet,  :
  NA inserted for impossible 1900-02-29 datetime

@sch56
Copy link
Author

sch56 commented Jun 25, 2021

Great response! Will definitely help when working with data out of SAP! They tend to separate date and time into separate columns, so great to end up on the right day when they are re-combined.

jeroen pushed a commit that referenced this issue Jun 25, 2021
Fix issue where Excel treats 1900 as a leapyear and add an
extra day.

Issue #58
jeroen pushed a commit that referenced this issue Jun 29, 2021
Fix leapday cutoff date from previous commit.

Issue #58
Previous PR #59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants