-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-43633: [R] Add tests for packages that might be tricky to roundtrip data to Tables + Parquet files #43634
Changes from all commits
d5103dc
2f34dba
c35abd4
f28d4d8
29a88b2
2fe6301
e2952c0
62ced17
4403b59
014e4be
ba3c1bf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
{% import 'macros.jinja' as macros with context %} | ||
|
||
{{ macros.github_header() }} | ||
|
||
jobs: | ||
extra-packages: | ||
name: "extra package roundtrip tests" | ||
runs-on: ubuntu-latest | ||
strategy: | ||
fail-fast: false | ||
env: | ||
ARROW_R_DEV: "FALSE" | ||
ARROW_R_FORCE_EXTRA_PACKAGE_TESTS: TRUE | ||
steps: | ||
{{ macros.github_checkout_arrow()|indent }} | ||
|
||
- uses: r-lib/actions/setup-r@v2 | ||
with: | ||
use-public-rspm: true | ||
- uses: r-lib/actions/setup-pandoc@v2 | ||
- uses: r-lib/actions/setup-r-dependencies@v2 | ||
with: | ||
working-directory: 'arrow/r' | ||
extra-packages: | | ||
any::data.table | ||
any::rcmdcheck | ||
any::readr | ||
any::units | ||
- name: Build arrow package | ||
run: | | ||
R CMD build --no-build-vignettes arrow/r | ||
R CMD INSTALL --install-tests --no-test-load --no-byte-compile arrow_*.tar.gz | ||
- name: run tests | ||
run: | | ||
testthat::test_package("arrow", filter = "extra-package-roundtrip") | ||
shell: Rscript {0} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
skip_on_cran() | ||
|
||
# Any additional package that we test here that is not already in DESCRIPTION should be | ||
# added to dev/tasks/r/github.linux.extra.packages.yml in the r-lib/actions/setup-r-dependencies@v2 | ||
# step so that they are installed + available in that CI job. | ||
|
||
# So that we can force these in CI | ||
load_or_skip <- function(pkg) { | ||
if (identical(tolower(Sys.getenv("ARROW_R_FORCE_EXTRA_PACKAGE_TESTS")), "true")) { | ||
# because of this indirection on the package name we also avoid a CHECK note and | ||
# we don't otherwise need to Suggest this | ||
requireNamespace(pkg, quietly = TRUE) | ||
} else { | ||
skip_if(!requireNamespace(pkg, quietly = TRUE)) | ||
} | ||
attachNamespace(pkg) | ||
} | ||
|
||
library(dplyr) | ||
|
||
test_that("readr read csvs roundtrip", { | ||
load_or_skip("readr") | ||
|
||
tbl <- example_data[, c("dbl", "lgl", "false", "chr")] | ||
|
||
tf <- tempfile() | ||
on.exit(unlink(tf)) | ||
write.csv(tbl, tf, row.names = FALSE) | ||
|
||
# we should still be able to turn this into a table | ||
new_df <- read_csv(tf, show_col_types = FALSE) | ||
expect_equal(new_df, as_tibble(arrow_table(new_df))) | ||
|
||
# we should still be able to turn this into a table | ||
new_df <- read_csv(tf, show_col_types = FALSE, lazy = TRUE) | ||
expect_equal(new_df, as_tibble(arrow_table(new_df))) | ||
|
||
# and can roundtrip to a parquet file | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Minor observation: we already test elsewhere (in the R package and in C++) that writing an Arrow Table to Parquet and back preserves Arrow metadata. So you could just write arrow files. Following that reasoning further, there should be more than enough test coverage in C++ that writing and then reading IPC files preserves all information in memory. So would it be sufficient to just There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, you're right. When I first started this I did the roundtrip to parquet because |
||
pq_tmp_file <- tempfile() | ||
write_parquet(new_df, pq_tmp_file) | ||
new_df_read <- read_parquet(pq_tmp_file) | ||
|
||
# we should still be able to turn this into a table | ||
expect_equal(new_df, new_df_read) | ||
}) | ||
|
||
test_that("data.table objects roundtrip", { | ||
load_or_skip("data.table") | ||
|
||
# https://github.com/Rdatatable/data.table/blob/83fd2c05ce2d8555ceb8ba417833956b1b574f7e/R/cedta.R#L25-L27 | ||
.datatable.aware=TRUE | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This works, but note the linked advice is for this to show up in R/, not here in tests/ The advice you're following is here: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! I've updated this comment |
||
|
||
DT <- as.data.table(example_data) | ||
|
||
# Table -> collect which is what writing + reading to parquet uses under the hood to roundtrip | ||
tab <- as_arrow_table(DT) | ||
DT_read <- collect(tab) | ||
|
||
# we should still be able to turn this into a table | ||
# the .internal.selfref attribute is automatically ignored by testthat/waldo | ||
expect_equal(DT, DT_read) | ||
|
||
# and we can set keys + indices + create new columns | ||
setkey(DT, chr) | ||
setindex(DT, dbl) | ||
DT[, dblshift := data.table::shift(dbl, 1)] | ||
|
||
# Table -> collect | ||
tab <- as_arrow_table(DT) | ||
DT_read <- collect(tab) | ||
|
||
# we should still be able to turn this into a table | ||
expect_equal(DT, DT_read) | ||
}) | ||
|
||
test_that("units roundtrip", { | ||
load_or_skip("units") | ||
|
||
tbl <- example_data | ||
units(tbl$dbl) <- "s" | ||
|
||
# Table -> collect which is what writing + reading to parquet uses under the hood to roundtrip | ||
tab <- as_arrow_table(tbl) | ||
tbl_read <- collect(tab) | ||
|
||
# we should still be able to turn this into a table | ||
expect_equal(tbl, tbl_read) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought a reason you put this in the regular test suite is so that they would run in local dev if you had the packages installed. But I won't have this env var set locally. Should this check
NOT_CRAN
maybe?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah exactly
This function is a little indirect: if you have the package installed, regardless of setting the envvar, it will run. If you have the envvar set and you don't have the package installed the tests will fail. If you have not set the envvar (or if it's false), and you don't have the packages installed, the tests will be skipped. This way in most CI setups we will silently move on and not need to install these packages, but in the one job where we want to ensure we are running, we can confirm that positively.